Web Scraping of news outlets using C++ into NoSQL databases

We are looking for a programmer to develop a c++ scraper for financial newsblogs. This should be reasonably commented, and run with parallel threads. The program should:

Authenticate itself (if necessary) on the website

Create a JSON object saving the contents of the article

Some websites that will be scraped are:

The Wall Street Journal -[url removed, login to view]

Seeking Alpha - [url removed, login to view]

The Motley Fool - [url removed, login to view]

..more websites are to come, so the script should have generic elements and be easily extensible

The results will be in JSON structure, preferably inserted into a mongoDB instance (couchDB may also be used), or for testing purposes json files.

We will accept solutions in a different language if they are run in a parallel fashion.

Habilidades: Programação C++ , node.js, NoSQL Couch & Mongo, Python, Captura de dados na web

Ver mais: wsj com, web develop news, the wall street journal, the motley fool, seeking alpha, scraping web contents in to json, run a c++ program online, programming with databases, programming using c++, programming using c#, programming news, programmer news, parallel programming in c, online programming web, object c online, motley fool, mod programming, generic programming, elements of programming, develop databases, databases programming, c++ programming websites, c# parallel programming, parallel programming c, mongodb c++

Acerca do Empregador:
( 415 comentários ) North Caldwell, United States

ID do Projeto: #5138634

Premiar a:


Hi. Why are you going to use C++ for such purpose? Usually this language is used in system level apps. Javascript, java, perl and pyhon commonly are used for web scraping. We have done many scraping projects usi Mais

$12 USD / hour
(18 Avaliações)

3 freelancers are bidding on average $20/hour for this job


Hey have experience doing a similar project but in node.js.I would prefer node.js for this rather than c++ becoz of the fact that node.js is non io-blocking which will help in our case where we have a lot of io. Again Mais

$10 USD / hour
(4 Comentários)

I have a lot of skills regarding C++/Network programming. I also have done some multiprocessing pipelines in C++ using boost. Looking forward to head from you. Best Regards, Julian David Rath

$38 USD / hour
(0 Comentários)