I am struggling to parse contents from HTML using htmlTreeParse and XPath.
Below is the web link from where I need to extract information of "most valuable brands" and create a data frame out of it.
As a first step towards building the table, I am trying to extract the list of brands (Apple, Google, Microsoft etc. ). I am trying through below code:
htmlContent <- getURL("http://www.forbes.com/powerful-brands/list/#tab:rank", ssl.verifypeer=FALSE)
htmlParsed <- htmlTreeParse(htmlContent, useInternal = TRUE)
output <- xpathSApply(htmlParsed, "/html/body/div/div/div/table[@id='the_list']/tbody/tr/td[@class='name']", xmlValue)
But its returning NULL. I am not able to find my mistake.
"/html/body/div/div/div/table[@id='the_list']/thead/tr/th" works correctly, returning ("", "Rank", "brand" etc.)
This means path upto table is correct. But I am not able to understand what's wrong thereafter.