We are looking for an experienced web programmer to develop a program that will crawl four public web sites and extract relevant data to construct an aggregate index.
The program will go through the web pages in each website and extract specific data pertaining to the index. The data shall be exported to a basic flat file with 20-30 fields after rudimentary parsing and data manipulation has been applied (i.e. date, time and such). The scale of the index is in the range of 50K-100K records and the output file, which represents an un-normalized database table, should be a CSV file that can be easily imported to Excel.
Further, we will want to update the index periodically and hence will require a second program to take an initial CSV file (the output of the last run) with the most updated index, iterate through the web sites again and produce both a delta CSV file (with the differences) and the updated CSV file with the newly added/updated/deleted records.
Lastly, we are looking for a quick turnaround mini-project and will most likely have follow-up projects if this one is successful.
28 freelancers estão ofertando em média $227 para esse trabalho
Hi, I have written Web Crawlers before. I also have sound knowledge of databases and User Interface development. Please check PMB for more details. Thanks, Yogesh