当前位置: 动力学知识库 > 问答 > 编程问答 >

Web scraping using perl

问题描述:

I might ask something stupid but I want to learn some web scraping. I already know how to use perl, so I would prefer to do it using this language. I know there are a lot of modules on CPAN, I tried to read those but I barely understand something. I haven't found anthing that would explain from zero what this process means. I could use some help with some links or some materials to study a little web scraping.

Thanks!

网友答案:

At a pretty basic level, 'web scraping' is just downloading a webpage, and parsing it to extract the information you want. At a started level, the module you want is LWP that lets you fetch content, and then 'something' to extract the information you want. HTML::Parser or HTML::TableExtract for example. There's nothing to say you can't roll your own using pattern matching of course, but ... well, processing HTML isn't a new problem, so why re-invent the wheel?

At a more advanced level though, you might want to interact with a site - log in to it perhaps, or 'click through' some menus. For this, I like WWW::Mechanize.

I'm afraid I can't give you much more without a better understanding of the sort of problem you're trying to solve though. Are you at a basic 'fetch a webpage and parse' sort of level?

(You can find details and examples of the above modules on CPAN. The LWP page has some examples that should get you started.)

网友答案:

I wrote a pretty basic tutorial on WWW::Mechanize here ..I have successfully crawled pages on several occasions so please let me know if you have a case you would like to try and need some help :)

网友答案:

To start you can look at WWW::Mechanize and HTML::TreeBuilder::XPath modules.

网友答案:

In my opinion, the best module for web scraping is Web::Scraper. Its language can be quite terse at times, but there are plenty of examples.

分享给朋友:
您可能感兴趣的文章:
随机阅读: