Em Andamento

web data extractor / filter

I need a php application which does the following:

A) Extract

access a specified html page, consisting of 100 numbered pieces of data (each linked to a separate page) and

extract from the top level-page:

a1) data matching 5 specific endings

a2) data matching two other patterns (every of the 100 pieces of data which is a single word only, and every of the 100 pieces of data which is a group of two words only)

extract from sub pages:

follow every one of the 100 links on the top level page only one link deep (these links are dynamic - potentially different every time the top level page is accesses), and extract

a3) data matching 5 specific endings (same as above)

B) Data Manipulation:

Raw data retrieved matching a2) patterns as above will need to be manipulated: remove spaces, and append one specific ending

C) Store:

Store data in a database, with date of (first) retrieval (duplicates should not be stored), and an extra attribute if it is data which has been manipulated (for a2 with the added ending).

D) Output:

create/update two daily txt files with data retrieved that day: 1 for a1+a3 combined, one for a2 data

Other requirements:

* A simple web interface to create data output by date range and type (a1+a3 and/or a2)

* Script should run every X minutes/hours (cron job)

* Possibility to specify a list of proxies (with an option for username/pw) auth, which the script will cycle through for web-access (must be able to skip non-responding proxies. No proxy if list is empty.

* Development/Testing on your own server, complete installation on my server when finished (CentOS / WHM / Cpanel)

I was thinking about php/curl/mysql as I am familiar with these, but feel free to suggest other methods if you know far superior methods.

Thanks for looking :)

Habilidades: Processamento de dados, Linux, PHP

Veja mais: install dataextractor filter error no1, install dataextractor filter error, dataextractor filter, data extractor filter, install data extractor filter, install data extractor filter error, data extractor filter error, web proxy free, web free, web development job requirements, web development free, top group, top data, superior group, script php proxy web, proxy server simple, proxy extractor, mysql data access, linux web, linux deep web, html 5 free web, free html web page, development web job, development application web, deep web linux

Acerca do Empregador:
( 1 comentário ) Wetzikon, Switzerland

ID do Projeto: #166201

Concedido a:


Please see PMB for detailed bid description.

$150 USD em 5 dias
(12 Comentários)

10 freelancers estão ofertando em média $196 para esse trabalho


Hello, please refer your PMB. Thank you.

$300 USD in 7 dias
(144 Comentários)

if it is not solely for php&linux, feel free to contact me.

$100 USD in 0 dias
(94 Comentários)

hi, kindly glance through your pmb. regards, Rakesh

$200 USD in 4 dias
(22 Comentários)

Please see PM for details.

$200 USD in 10 dias
(54 Comentários)

Hi, I can do this easily and have worked on several similar projects. I agree with you that php/mysql/curl is probably the best way forward. I am 24, live in south east UK and take pride in the high quality of m Mais

$150 USD in 4 dias
(2 Comentários)

I have a script which nearly suits your requirement. Please see PM for details.

$300 USD in 5 dias
(1 Comentário)

I am interested to work with you. I have the total experience of 4 years in PHP, mysql, javascript and AJAX. I will assure you that you will get 100% satisfaction. Hope to hear from you soon.

$275 USD in 5 dias
(1 Comentário)

Please check PM

$180 USD in 4 dias
(0 Comentários)

hi i can perform well while completing your project.

$100 USD in 10 dias
(0 Comentários)