Cancelado

Web spider to crawl specific site and cultivate data

PHP/Ruby/Perl spider to cultivate data from a specific web page/category and store the data in a mysql database, and download a high-res image and store the URL/filename in the database.

## Deliverables

I need a php crawler built that will do the following:

The spider will be given a URL, example: [url removed, login to view]:PD-Art_(Yorck_Project)

It will index each and every item found on this page and save the following data:

Artist

[Alfred Sisley][1] (1839-1899) [![Link back to Creator infobox template][2]][3]

Title

**Deutsch:** K?hne auf dem Kanal Saint-Martin in Paris

Date

[1870][4]

Medium

Oil on canvas

Dimensions

**Deutsch:** 55 ? 74 cm

Current location

**Deutsch:** Sammlung Oskar Reinhart am R?merholz

**Deutsch:** Winterthur

Notes

**Deutsch:** Landschaftsmalerei

Source/Photographer

The Yorck Project: *10.000 Meisterwerke der Malerei.* DVD-ROM, 2002. [ISBN 3936122202][5]. Distributed by DIRECTMEDIA Publishing GmbH.

This will all be stored in the database as an individual entry, along with the URL to a locally downloaded high-res image.

Timing should be able to be set. THere appears to be 10000 items in this category, I do not wish the spider to venture outside of this category, but I would like to be able to switch categories for the future. This can all be through config files. No GUI necessary.

Project will need to be completed in 2 days.

Habilidades: Engenharia, Gestão de projetos, Instalação de Script, Shell Script, Arquitetura de software, Teste de Software

Ver mais: wikimedia commons, web of ruby, web der, web crawler wiki, web config php, store web template, mysql data entry gui, image venture, found an artist, distributed data entry, data dimensions, wikimedia download, artist data, web template download, web site template download, web page download, template web download, ruby web, wikimedia, web ruby, spider, specific, oskar, gui creator, crawl data

Acerca do Empregador:
( 12 comentários ) Canada

ID do Projeto: #3059673