Website crawler for HTML content

Encerrado Postado Nov 23, 2009 Pago na entrega
Encerrado Pago na entrega

I need a crawler to identify phrases in the html of websites, for example "google analytics".

There will be about 5 phrases in total, i want this to be an input that i can control. I want to be able to control the depth of the crawl in terms of how many levels "deep" the crawler goes into the website (e.g., home page --> about us --> management would be 3 layers deep).

Also, i want to be able to control the total number of pages crawled per site, e.g., cut-off search after 100 pages crawled.

Finally, the crawler needs to be able to crawl 20,000 sites in about a week. Therefore, the winner bidder needs to be able to build a "fast" crawler--e.g., utilizing multi-threading etc. Also, i will need to be able to upload the urls of the websites I want to crawl.

Finally, this crawler needs to be completed in a couple days.

This is something that was allready asked a couple of months ago by somebody else. But I need it as well now.

PHP

ID do Projeto: #556542

Sobre o projeto

6 propostas Projeto remoto Ativo em Dec 28, 2009

6 freelancers estão ofertando em média $177 nesse trabalho

wildlily980

I'm interesting in it. check pmb for detaisl.

$150 USD in 7 dias
(47 Comentários)
6.3
numatido

Hi, Please check your PM. Thanks.

$150 USD in 2 dias
(2 Comentários)
2.8
svetlinb

Contact me to clarify details on the project

$150 USD in 2 dias
(0 Comentários)
0.0
alphacoms

I can do this in PHP. This will be a multi-threading script, if we can say this. PHP doesnt naturally support it, but there are some tricks to implement it. I've the similar experience.

$180 USD in 7 dias
(0 Comentários)
0.0