Using Perl to Gather Information from the Web

By Tom Hukins
Date: Saturday, 22 October 2005 10:50
Duration: 20 minutes
Language:

You can find more information on the speaker's site:

Talk: http://people.freebsd.org/~tom/yapc-eu-2005.pdf

Whilst some Web site owners have opened up their information, either through REST or SOAP interfaces, many have not. Screen scraping remains the only viable approach to gather information from such Web sites.

My talk will explore how Perl, WWW::Mechanize and XPath can make gathering information from such sites easier and more robust, even when working with badly formed HTML. I will compare the XPath approach to the more commonly used tokenising technique used by HTML::Parser.

I will also discuss other tools that help developers gather information from sites lacking public interfaces and how to use these tools to write simple, flexible Perl code.

The Nordic Perl Workshop is a joint venture between the
Copenhagen and the Stockholm Perl Mongers.