Tuesday, August 17, 2010

Spidering Hacks











Spidering Hacks
By Tara Calishain, Kevin Hemenway
Publisher: O'Reilly
Pub Date: October 2003
ISBN: 0−596−00577−6
Pages: 424
Copyright
Credits
About the Authors
Contributors
Preface
Why Spidering Hacks?
How This Book Is Organized
How to Use This Book
Conventions Used in This Book
How to Contact Us
Got a Hack?
Chapter 1. Walking Softly
Hacks #1−7
Hack 1. A Crash Course in Spidering and Scraping
Hack 2. Best Practices for You and Your Spider
Hack 3. Anatomy of an HTML Page
Hack 4. Registering Your Spider
Hack 5. Preempting Discovery
Hack 6. Keeping Your Spider Out of Sticky Situations
Hack 7. Finding the Patterns of Identifiers
Chapter 2. Assembling a Toolbox
Hacks #8−32
Perl Modules
Resources You May Find Helpful
Hack 8. Installing Perl Modules
Hack 9. Simply Fetching with LWP::Simple
Hack 10. More Involved Requests with LWP::UserAgent
Hack 11. Adding HTTP Headers to Your Request
Hack 12. Posting Form Data with LWP
Hack 13. Authentication, Cookies, and Proxies
Hack 14. Handling Relative and Absolute URLs
Hack 15. Secured Access and Browser Attributes
Hack 16. Respecting Your Scrapee's Bandwidth
Hack 17. Respecting robots.txt
Hack 18. Adding Progress Bars to Your Scripts
Hack 19. Scraping with HTML::TreeBuilder
Hack 20. Parsing with HTML::TokeParser
Hack 21. WWW::Mechanize 101
Hack 22. Scraping with WWW::Mechanize
Hack 23. In Praise of Regular Expressions
Hack 24. Painless RSS with Template::Extract
Hack 25. A Quick Introduction to XPath
Hack 26. Downloading with curl and wget
Hack 27. More Advanced wget Techniques
Hack 28. Using Pipes to Chain Commands
Hack 29. Running Multiple Utilities at Once
Hack 30. Utilizing the Web Scraping Proxy
Hack 31. Being Warned When Things Go Wrong
Hack 32. Being Adaptive to Site Redesigns
Chapter 3. Collecting Media Files
Hacks #33−42
Hack 33. Detective Case Study: Newgrounds
Hack 34. Detective Case Study: iFilm
Hack 35. Downloading Movies from the Library of Congress
Hack 36. Downloading Images from Webshots
Hack 37. Downloading Comics with dailystrips
Hack 38. Archiving Your Favorite Webcams
Hack 39. News Wallpaper for Your Site
Hack 40. Saving Only POP3 Email Attachments
Hack 41. Downloading MP3s from a Playlist
Hack 42. Downloading from Usenet with nget
Chapter 4. Gleaning Data from Databases
Hacks #43−89
Hack 43. Archiving Yahoo! Groups Messages with yahoo2mbox
Hack 44. Archiving Yahoo! Groups Messages with WWW::Yahoo::Groups
Hack 45. Gleaning Buzz from Yahoo!
Hack 46. Spidering the Yahoo! Catalog
Hack 47. Tracking Additions to Yahoo!
Hack 48. Scattersearch with Yahoo! and Google
Hack 49. Yahoo! Directory Mindshare in Google
Hack 50. Weblog−Free Google Results
Hack 51. Spidering, Google, and Multiple Domains
Hack 52. Scraping Amazon.com Product Reviews
Hack 53. Receive an Email Alert for Newly Added Amazon.com Reviews
Hack 54. Scraping Amazon.com Customer Advice
Hack 55. Publishing Amazon.com Associates Statistics
Hack 56. Sorting Amazon.com Recommendations by Rating
Hack 57. Related Amazon.com Products with Alexa
Hack 58. Scraping Alexa's Competitive Data with Java
Hack 59. Finding Album Information with FreeDB and Amazon.com
Hack 60. Expanding Your Musical Tastes
Hack 61. Saving Daily Horoscopes to Your iPod
Hack 62. Graphing Data with RRDTOOL
Hack 63. Stocking Up on Financial Quotes
Hack 64. Super Author Searching
Hack 65. Mapping O'Reilly Best Sellers to Library Popularity
Hack 66. Using All Consuming to Get Book Lists
Hack 67. Tracking Packages with FedEx
Hack 68. Checking Blogs for New Comments
Hack 69. Aggregating RSS and Posting Changes
Hack 70. Using the Link Cosmos of Technorati
Hack 71. Finding Related RSS Feeds
Hack 72. Automatically Finding Blogs of Interest
Hack 73. Scraping TV Listings
Hack 74. What's Your Visitor's Weather Like?
Hack 75. Trendspotting with Geotargeting
Hack 76. Getting the Best Travel Route by Train
Hack 77. Geographic Distance and Back Again
Hack 78. Super Word Lookup
Hack 79. Word Associations with Lexical Freenet
Hack 80. Reformatting Bugtraq Reports
Hack 81. Keeping Tabs on the Web via Email
Hack 82. Publish IE's Favorites to Your Web Site
Hack 83. Spidering GameStop.com Game Prices
Hack 84. Bargain Hunting with PHP
Hack 85. Aggregating Multiple Search Engine Results
Hack 86. Robot Karaoke
Hack 87. Searching the Better Business Bureau
Hack 88. Searching for Health Inspections
Hack 89. Filtering for the Naughties
Chapter 5. Maintaining Your Collections
Hacks #90−93
Hack 90. Using cron to Automate Tasks
Hack 91. Scheduling Tasks Without cron
Hack 92. Mirroring Web Sites with wget and rsync
Hack 93. Accumulating Search Results Over Time
Chapter 6. Giving Back to the World
Hacks #94−100
Hack 94. Using XML::RSS to Repurpose Data
Hack 95. Placing RSS Headlines on Your Site
Hack 96. Making Your Resources Scrapable with Regular Expressions
Hack 97. Making Your Resources Scrapable with a REST Interface
Hack 98. Making Your Resources Scrapable with XML−RPC
Hack 99. Creating an IM Interface
Hack 100. Going Beyond the Book
Colophon
Index

Download
Another Hacker books

No comments:

Post a Comment

Related Posts with Thumbnails

Put Your Ads Here!