Description: Seeking a web crawler developer to develop a small website crawler for one auto classifieds site.
The web crawler/spider will have the following characteristics :
1. Primary purpose is to crawl this specific auto sales website for all auto listings Private and Trade email addresses and store the listing email addresses into an excel spreadsheet.
2. INPUTS –
a) Crawler – A crawler which crawls the specified site and indexes autos listings email addresses. The crawler must be set up to crawl all the following AUTO listings makes and models: Cars 4x4s, Bikes, commercials, Plant and Leisure. Search, Private and Trade and by Region.
3. ADMIN CONSOLE - The software must have a management console enabling the following functions :
a) Automated Scheduling of crawler for target site [hourly, daily, weekly, bi-weekly, etc] for the crawler to run.
b) Reporting of crawl progress, results, log.
c) Exception handling – providing details of items not crawled.
d) Duplicate email address handling IS paramount, to delete duplicate listings email addresses.
e) Deletion handling to recognise that previously crawled listings/email addresses are no longer listed on the target site and to handle these accordingly by ignoring these listings into an inactive or archive table separate to the main listings
f) Backup functions to enable all of the email database to be backed up.
g) The ability to easily search for and edit and remove email address records.
- Previous experience of crawler / spider development and examples/references are essential.
- This project is Price & Quality Sensitive
13 freelancers are bidding on average $202 for this job
I am currently looking into your project. I would be happy to provide a sample. Would you please upload an example so that I may provide you with a sample? Please feel free to ask any questions. Thank you.