Im scrapping the price of some products from a website . In Python I used the urllib2 without problems, but when I tried using RCurl in R I couldn't donwload the source code.
I have to paste the source code with the product code, then I catch the price. The path of a product is: http://www.americanas.com.br/produto/code_of_product.
Actually, I can't download the source code of a product with RCurl. When I try for example getURL('http://www.americanas.com.br/produto/111467594') it returns "".
I tried using getURL('.../produtos/111467594') and I could download the source, but in this way I'm unable to get the price. :(
Anyone know how could I get the price of the products?
Ps.: Sorry for my bad english. :)
welcome to StackOverflow.
It's hard to say for me why it doesn't work, could you include a
verbose=TRUE in the
getURL? Also, I notice there's different prices on the webpage you linked. You want all or just the first? How about this to get the "Por price":
library("stringr") productwebpage<-readLines("http://www.americanas.com.br/produto/111467594") pricerow<-productwebpage[grep("p class=\"sale price\"",productwebpage)] price<-str_extract_all(pricerow,"\\(?[0-9,.]+\\)?")[]
You could also substitute the
grep("p class=\"sale price\"",productwebpage) to either
grep("<p><span class=\"regular price\">",productwebpage) (to get the "de price" / old price) or
grep("<span class=\"p-v interest\">",productwebpage) (which will give you the "sem jouros" price / per month payment). For the last example you will get the months first and the payment after so it will be:
> price  "12" "83,25"
This should hopefully work for other products as well (just tried 5 and seemed to work for all of them).