Python Web Crawler

Create a python application that will walk a web site with bibliographic data, gathering author names. These names and their papers' names will be stored in a couple of DBMS tables. Application will keep track of how often it has crawled and extracted data from these web sites. When the web sites change (date/diff) they will be crawled again and new information will be added to the database tables (and timestamp noted in table entries). This system will be used by a researcher to perform a continuous prior art search. The researcher will usually keep track of 20-30 other researcher's home pages. These home pages usually have a listing of papers. The format of these listings varies, so you cannot definitively parse the information. Therefore, the researcher needs to perform the "mapping". So, if initially you present the listing to the researcher in HTML format, the researcher can cut and paste the relevant paper titles into entry fields. You can then persist these titles in some "paper" table. You can also cache a copy of the listing web page for future crawls. So, in future crawls, crawl the page and present the HTML diff text to the researcher. This most likely will contain text of new papers. All code is to be written in Python in the Plone framework, and database operations should all use SQLAlchemy.

## Deliverables

1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.

2) Deliverables must be in ready-to-run condition, as follows (depending on the nature of the deliverables):

a) For web sites or other server-side deliverables intended to only ever exist in one place in the Buyer's environment--Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment.

b) For all others including desktop software or software the buyer intends to distribute: A software installation package that will install the software in ready-to-run condition on the platform(s) specified in this bid request.

3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).

## Platform

Plone/Windows XP

Habilidades: Engenharia, MySQL, PHP, Python, Arquitetura de software, Teste de Software

Veja mais: working of web crawler, working from home python, web source format, web-crawler, researcher work from home, python web framework, python home work, python hire, python coder for hire, paper work from home, new hire paper work, how to request hire researcher, how to create a data entry form, hire python, hire a web researcher, hire a python coder, date entry from home, art site web, how to create a web crawler, researcher hire, papers written, web researcher, sqlalchemy, researcher for hire, python work

Acerca do Empregador:
( 131 comentários ) United States

ID do Projeto: #3006078

3 freelancers estão ofertando em média $1275 para esse trabalho


See private message.

$1530 USD in 7 dias
(25 Comentários)

See private message.

$595 USD in 7 dias
(11 Comentários)

See private message.

$1700 USD in 7 dias
(5 Comentários)