Web Crawler - Business Contact Details

I require a web crawler to extract web base contact information regarding a businesses including Name, Website URL, Address, Phone, Mobile, Fax Number and business specialty if one is present. The crawler must also be able to accomodate multiple addresses and contact ph and fax numbers for one business.

The primary contact sites to crawl through are [url removed, login to view] and www.yellowpages.com.au.

Data results will be checked against an existing database for accuracy of results.


1. I must be able to set the starting URL from which the spider will intitiate from on the websites. The format of the data on each website should be examined closely before commencing as there are multiple data fields that are displayed if information is present.

2. The spider should contain its own database of products, professions and service such that it can use these as a basis of initiation of searches. Data is to be extracted into XML or ASCII format and then imported directly into a MySQL or Postgres Database file.

3. Spider must crawl through multiple pages until the final page for that category is completed. However, at the very beginning of most categories, there are businesses listed under the "Yellow Pages - Advertisers" heading. These are businesses that are not from the area that I have chosen but are advertising in that area. I do not want these entries included. The spider does not neccessarily need to know how my list was created, only to avoid entries under the "Advertisers" section.

4. When completed, an update function should let me choose a new search profession name and initiate the search.

5. Search and purge function that can be run anytime on any of the database files that have been created to ensure no two entires have the same telephone number/fax number. If duplicates telephone/fax numbers are found, records with the least information are automatically deleted. For example, 2 records with the same telephone/fax numbers but one lists a website and the other doesn't, then delete the one without the website number.

6. I require that this program be functional for both websites and that the system can reinitiate the searches to capture update info after say 4-5 months.

7. Finally, the crawler must function despite any anti-crawler or anti search / DOS protection (if any) being run by the site administrators.

My Requirements:

1. You will be easily contacted. Either by phone, or you will be required to answer any e-mail I send to you within 10 hours time.

2. Must speak and write english well.

3. Code must be well commented in english.

4. All source code must be given to me.

5. I would prefer if this was written in Java, perl or python but XML is also OK.

6. I would like this done by no later than November 21st, 2007.

7. Must be able to run on my Windows XP machine or hosted in a USA data centre. Data usage is not an issue.

Habilidades: Processamento de dados

Veja mais: data contact crawler, web crawler details, business crawler, website crawler contact details, business web crawler, crawler business contact, contact number crawler, web crawler contact numbers, web crawler business, list businesses usa web crawler, contact details crawler, web crawler phone numbers, data crawler business, www whitepages com, www search com au, write xml code website, windows phone for business, whitepages com, web source format, web searches database, web format, web dos, web-crawler, source code protection, search files on web

Acerca do Empregador:
( 0 comentários ) Sydney, Australia

ID do Projeto: #183261

7 freelancers estão ofertando em média $537 para esse trabalho


Hi, Thanks for given an opportunity at "ARUHAT TECHNOLOGIES". Kindly go through PM for detail analysis of your requirement. Regards, Maulik

$500 USD in 10 dias
(2 Comentários)

Hi I am new at GAF but I have four years of experience as Software Engineer. I have working experience of PERL, C, C++, PHP and MySQL. I am working to develop Web Crawlers for various video sharing sites like youtub Mais

$600 USD in 20 dias
(0 Comentários)

Check PM please.

$550 USD in 29 dias
(0 Comentários)

I am a kapow([login to view URL]) speacialist with two years expirience and I have a liscense to kapow mashup server enterprise edition (witch cost many tousends of dollars). The ability of this tool eceeds by fa Mais

$550 USD in 3 dias
(0 Comentários)

Please check PM

$560 USD in 10 dias
(0 Comentários)

hello, i have made a similar project for e marketting. i have a demo email crawler which i can show you if you like.

$400 USD in 7 dias
(0 Comentários)

Hi, I have three years of experience as Software Engineer. From last couple year, I have developed two same type of project. One of my web crawler (Server Client Architecture) fetching the data from the 3 properties w Mais

$600 USD in 30 dias
(0 Comentários)