Write a Perl module to crawl Auto [login to view URL] and extract listing information for all cars that are newly listed or have a price change since the last crawl. This module must work in our existing object oriented crawling framework that is robust and provides basic services you are expected to use. Example listing information that you are expected to collect includes: Original URL, Thumbnail Photograph, Classified Title, Make, Model, Year, Miles, Transmission, Engine, Trim, VIN, Color, Fuel Type, Price, Address, City, State, and Zip Code. Our framework has a web page fetching facility, a vehicle observed history facility, a Make/Model matching system, a queue of pages to be crawled, and it makes extensive use of existing HTML parsers. You will have access to the driver program and an actual module that crawls a classified site. Existing modules run from 300 to 450 lines. You are required to use HTML::TreeBuilder, our already existing page fetching framework, and object oriented Perl techniques. One of our servers will be made available for you continuously to simplify your development and testing, as well as reduce your burden of duplicating our running environment. The module will have two functions, a constructor, and a function that is passed a page. Further page urls to be crawled are added to a queue. The extracted and saved information will be formatted as trivial XML. The crawler program takes parameters for the site name, zip code, and distance from the zip code, which will be used by the module as CGI parameters to Auto Mart.com. Code must be well documented.
## Deliverables
1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.
2) Deliverables must be in ready-to-run condition, as follows (depending on the nature of the deliverables):
a) For web sites or other server-side deliverables intended to only ever exist in one place in the Buyer's environment--Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment.
3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).
* * *This broadcast message was sent to all bidders on Friday Nov 23, 2007 6:45:58 PM:
Thanks for your interest. This a four day holiday for us, so I'm being a bit slow responding to people due to all the social engagements. Some information that some of you have requested: Trim is parsed out of the title by a function we provide. For instance "2000 BMW 328Ci" will be parseable using our functions to Year => 2000, Make => BMW, Model => 3-Series, Trim => 328Ci. We only have trim coverage for a few makes, currently. That will automatically be expanded when we upgrade the parsing function. We understand that the listing format can change at any time. It's best if the parsing is as flexible as possible, but we understand that these will break on occasion and need fixing. That will not be held against you as long as your parsing style is not excessively fragile. The first thumbnail is sufficient to extract from each listing. We have example code for using ImageMagick to create the thumbnail. It's preferable to shrink a large picture than to slightly resize a thumbnail, due to quality degradation from too much resizing. I'll give a you look at the code later on, Fred
* * *This broadcast message was sent to all bidders on Monday Nov 26, 2007 4:33:39 PM:
Just to let you know, I adjusted the project type to $100 and above, to allow people more flexibility in bidding. Sorry for the first timer mistake.
## Platform
Linux/Perl/SQLite. The operating environment will be supplied for you online.