We are looking for a programmer to develop a c++ scraper for financial newsblogs. This should be reasonably commented, and run with parallel threads. The program should:
Authenticate itself (if necessary) on the website
Create a JSON object saving the contents of the article
Some websites that will be scraped are:
The Wall Street Journal -[url removed, login to view]
Seeking Alpha - [url removed, login to view]
The Motley Fool - [url removed, login to view]
..more websites are to come, so the script should have generic elements and be easily extensible
The results will be in JSON structure, preferably inserted into a mongoDB instance (couchDB may also be used), or for testing purposes json files.
We will accept solutions in a different language if they are run in a parallel fashion.
3 freelancers are bidding on average $20/hour for this job
I have a lot of skills regarding C++/Network programming. I also have done some multiprocessing pipelines in C++ using boost. Looking forward to head from you. Best Regards, Julian David Rath