Encerrado

Scrape sites and translate

Looking for someone to process all the wikis above 50k in size from [url removed, login to view] and [url removed, login to view] . Include non-English versions, talk pages and user pages but not revision history (ie you only need the most recent articles). We would like you to: A. convert the above wikis listed into html and their associated image dumps. B. extract the html and in-article images from the wikis attached by crawling the sites. Only extract the article itself, not the template (navigation and footer). C. machine translate the html from both A and B into as many languages as possible (from and to English and between other non-English languages). For this we recommend you use a translator that preserves html. D. Remove broken links (eg where in Wikipedia people link to articles that have not been written yet). E. Output HTML and where there are associated image dumps, ensure that links to the images are ok, otherwise strip them out. Only include article pages, not other namespaces (eg image, user and discussion pages). This must be able to be run periodically. So any manual processes must be clearly documented. This can be run on our servers or on yours. If it's ours, it must be on the linux platform and require no more than 400MB memory. For this initial run, all data from all the wikis in all the languages is a deliverable. Subsequent runs we can negotiate something beyond this project. Coders without significant history on RAC must submit a portfolio and expert guarantee of 20% to be considered.

## Deliverables

The script needs produce one table with the following fields: * Article name (in original language) * source wiki * Original Language (2 letter code) * Translated Language (2 letter code).. Leave as NULL if it's the original untranslated content * html (translated, where an image dump is provided have correct links to them, otherwise take them out) * date of translation * date of article (as specified in XML feed) * popularity of article (where the record exists) This task deals with a large amount of data (in the gigabytes) so it's advisable that only coders with experience in managing gigabytes of data attempt this. 1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.

2) Deliverables must be in ready-to-run condition, as follows (depending on the nature of the deliverables):

a) For web sites or other server-side deliverables intended to only ever exist in one place in the Buyer's environment--Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment.

b) For all others including desktop software or software the buyer intends to distribute: A software installation package that will install the software in ready-to-run condition on the platform(s) specified in this bid request.

3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).

## Platform

Linux preferably

Habilidades: PHP

Ver mais: work translate, wiki navigation, hire legal translator, web content dump, translator people, translation sites, translation portfolio, translate need, translate people, translate d, submit xml feed, recommend letter, portfolio sites, php script null, navigation wiki, manual translator, like translate, letter hire template, hire sites, hire letter template, hire wikipedia expert, hire expert talk, hire wikipedia, expert wiki, e translator

Acerca do Empregador:
( 48 comentários ) Australia

ID do Projeto: #2987064

7 freelancers estão ofertando em média $1791 para este trabalho

eye4tech

See private message.

$2125 USD in 30 dias
(32 Comentários)
6.9
tzo

See private message.

$1700 USD in 30 dias
(50 Comentários)
4.9
man0110

See private message.

$1487.5 USD in 30 dias
(8 Comentários)
3.2
coderhn

See private message.

$1700 USD in 30 dias
(3 Comentários)
2.5
agajn

See private message.

$1700 USD in 30 dias
(0 Comentários)
0.0
mydarkeyes

See private message.

$1700 USD in 30 dias
(5 Comentários)
0.0
nokc

See private message.

$2125 USD in 30 dias
(1 Comentário)
0.0