当前位置: 动力学知识库 > 问答 > 编程问答 >

r - Trying get the price of products with RCurl

问题描述:

Im scrapping the price of some products from a website . In Python I used the urllib2 without problems, but when I tried using RCurl in R I couldn't donwload the source code.

I have to paste the source code with the product code, then I catch the price. The path of a product is: http://www.americanas.com.br/produto/code_of_product.

Actually, I can't download the source code of a product with RCurl. When I try for example getURL('http://www.americanas.com.br/produto/111467594') it returns "".

I tried using getURL('.../produtos/111467594') and I could download the source, but in this way I'm unable to get the price. :(

Anyone know how could I get the price of the products?

Thanks.

Ps.: Sorry for my bad english. :)

网友答案:

welcome to StackOverflow.

It's hard to say for me why it doesn't work, could you include a verbose=TRUE in the getURL? Also, I notice there's different prices on the webpage you linked. You want all or just the first? How about this to get the "Por price":

library("stringr")

productwebpage<-readLines("http://www.americanas.com.br/produto/111467594")
pricerow<-productwebpage[grep("p class=\"sale price\"",productwebpage)] 
price<-str_extract_all(pricerow,"\\(?[0-9,.]+\\)?")[[1]]

You could also substitute the grep("p class=\"sale price\"",productwebpage) to either grep("<p><span class=\"regular price\">",productwebpage) (to get the "de price" / old price) or grep("<span class=\"p-v interest\">",productwebpage) (which will give you the "sem jouros" price / per month payment). For the last example you will get the months first and the payment after so it will be:

> price
[1] "12"    "83,25"

This should hopefully work for other products as well (just tried 5 and seemed to work for all of them).

分享给朋友:
您可能感兴趣的文章:
随机阅读: