Em Andamento

List Gatherer Script Finish

Hi,

I have a perl script that uses a spider to crawl certain websites and gathers the information from them and then inserts this information into the database. The script is almost done except it needs just a few small finishing touches which are outlined below.

Remember this script is about 85-90% completed, we just need someone to put the finishing touches on it.

Below

1) Grabbing category we are searching and inserting it into the database.

We search through an perl admin section where we first select the site, then enter in the category(for example books) and then the location. We want it so that when it gathers these results from site it will also insert the category that we have entered into a row in the database.

2) Removing the special characters for some of the words/cities.

Some of the cities/words are pulling in special characters. For example a city such as Montreal may have be spelled Montre@l etc.. We need these characters not to show up and the actual letters to be used.

3) Fixing certain sites because they were only searching one state/province.

There are a few sites that when scraped the gather only searches one state or province even if the city we enter isn't in this province. We need this fixed.

4) If the website address ends in .[url removed, login to view] or [url removed, login to view] etc.. then it doesn't insert it into the database.

We also collect if the listening has a website and if it does it is inserted into the database. The problem is some of the sites have generic yellow pages addresses that are actually an online advertisement for them because they don't have a website. We don't want these inserted into the database and only won't actual websites inserted into the database.

5) We have a php programmer doing a browser based admin section in php. One of the problems we run into is that some of the addresses don't have a postal/zip code. So we wanted to set it up so that when we select certain addresses(in the php admin section) to download then it could run another spider to spider these sites below, grab the zip/postal codes and insert them into the database corressponding to the addresses that we are doing to download?

[url removed, login to view]

[url removed, login to view]

[url removed, login to view];pageId=pcaf_pc_search&gear=postcode

[url removed, login to view]

6) Scraping [url removed, login to view]

If possible we would like the site about to be scraped.

7) Also when the address gets added into the database we'll need it split so that they name of the business, the street address, the city, the state/province and if the postal/zip code is there these are all in a seperate field in the database for that listing.

If you have any questions or need any clarification please pm me. Thanks.

--Anthony

Habilidades: Perl

Ver mais: gather script, gatherer script, gatherer liste, perl gatherer, yp com, yp, yellow characters, www usps com, websites problems, usps .com, usps com, tools a programmer needs, the street com, site scraping online, searching problem, script advertisement example, problem websites, problems that need fixing, php codes list, php catid, perl search script, online business list of websites, online advertisement websites, online advertisement of business, online advertisement business

Acerca do Empregador:
( 23 comentários ) bedford, Canada

ID do Projeto: #176942

Premiar a:

gangabass

I can do this in three days.

$100 USD em 3 dias
(136 Avaliações)
5.8

4 freelancers estão ofertando em média $138 para este trabalho

vsviridov

I will entirely completed the script in three days. Only one thing what I can't garantee is about WorldWeb.com.

$100 USD in 3 dias
(2 Comentários)
3.9
pashanoid

Do-able, results guaranteed!

$150 USD in 5 dias
(1 Comentário)
2.6
prajaysystems

Hi, We can do this. Please check PM

$200 USD in 3 dias
(1 Comentário)
1.4