Em Andamento

Web Scraping of news outlets using C++ into NoSQL databases

We are looking for a programmer to develop a c++ scraper for financial newsblogs. This should be reasonably commented, and run with parallel threads. The program should:

Authenticate itself (if necessary) on the website

Create a JSON object saving the contents of the article

Some websites that will be scraped are:

The Wall Street Journal -[url removed, login to view]

Seeking Alpha - [url removed, login to view]

The Motley Fool - [url removed, login to view]

..more websites are to come, so the script should have generic elements and be easily extensible

The results will be in JSON structure, preferably inserted into a mongoDB instance (couchDB may also be used), or for testing purposes json files.

Habilidades: Programação C++ , node.js, NoSQL Couch & Mongo, Python, Captura de dados na web

Ver mais: web develop news, seeking alpha, programming databases, programming news, programmer news, online programming web, motley fool, mod programming, generic programming, develop databases, databases programming, json programming, programmer street, parallel programming, json nosql, financial programming, wall run script, json web, wall script create, parallel website, create news websites, news wall, programming financial, journal script

Acerca do Empregador:
( 415 comentários ) North Caldwell, United States

ID do Projeto: #5138634

Premiar a:

nitelfreelance

Hi. Why are you going to use C++ for such purpose? Usually this language is used in system level apps. Javascript, java, perl and pyhon commonly are used for web scraping. We have done many scraping projects usi Mais

$12 USD / hora
(18 Avaliações)
5.3

3 freelancers estão ofertando em média $20/hora para este trabalho

jibyjose001

Hey have experience doing a similar project but in node.js.I would prefer [url removed, login to view] for this rather than c++ becoz of the fact that [url removed, login to view] is non io-blocking which will help in our case where we have a lot of io. Again Mais

$10 USD / hora
(4 Comentários)
4.9
julianrath

I have a lot of skills regarding C++/Network programming. I also have done some multiprocessing pipelines in C++ using boost. Looking forward to head from you. Best Regards, Julian David Rath

$38 USD / hora
(0 Comentários)
0.0