Cancelado

Web crawler & data extraction

I am looking for a developer with strong PHP and MySQL skills to develop a web crawler and data extraction spider.

The spider needs to perform these functions:

1) Search for all sites in a particular country that meet the specific subject search terms. This could be achieved by querying Google to get a list of sites to crawl. This will then create a list of sites to crawl regularly.

2) Crawl the list of sites from step 1 and search for a specific type of item on the crawled web sites in the list.

3) If the item type is found on the web site then extract the data from the web pages in as clean a way as possible from between the <body></body> tags. This process will remove as many HTML, CSS and other tags as possible to acquire data that is relevent and as free from distracting tags as possible.

4) Write the extracted content to a text field in the MySQL (version 4.1+) database.

All code to be PHP 4+ and PHP 5+ compatible and to run from a powerful Linux server.

There will be other projects for the developer with the right skill sets, experience and sheer ability to deliver high quality systems at a reasonable price.

Habilidades: Processamento de dados, PHP

Ver mais: web sites developer, web site create free, web search remove, web pages html code, web free, web develop php, web developer skill, web developer search, web developer price, web developer free, web crawler developer, web code html, web code developer, type web developer, spider web data extraction, skill web developer, sites create web pages, search web developer, meet code, list web developer skills, html free web, google web linux, google web developer, google developer web, free web site developer

Acerca do Empregador:
( 27 comentários ) Kingsbridge, United Kingdom

ID do Projeto: #162944