Em Andamento

Unicode Scraping Database

I need to create a MySQL table based on the work at [url removed, login to view];Param=1&g=1&h=0&r=1&t=1&p=1&k=0

The MySQL table would have five fields (columns):

Column 1) The current page number (there are a total of 1,430 pages, all of which must be included in the table - the link given above is page 1).

Columns 2 through 5) The Unicode text for each individual translation (displayed as four separate color-coded lines as seen via the above link).

So, for example, the first page would contribute six fields and 23 records to the MySQL database. The 23 records would be for each of the 23 blocks of text, and the six fields would be for each of the translations, along with the current page number, which in this case would be 1. The same sort of pattern would then be followed for the remaining pages. Therefore, the final table with all the pages inputted would have six fields and several thousand records.

You are free to use a scraper to populate the database, as long as the Unicode character formatting is preserved. I don't care about the method you use to create the table - I only care about the final product, the table itself.

Delivery time should be within two months of project acceptance date.

I must see an initial table consisting of just the first three pages, so I can ensure that the data is being inputted properly. This will save us both time, before we proceed with the full project. This initial work can be provided to me as a MySQL dump file.

Habilidades: Processamento de Dados, Processamento de dados, PHP, Gestão de projetos, XML

Veja mais: xml use case, three p, the five r, scraping free, mysql database for free, free lines, five r, xml translation, We Scraping , servlet, P&G, need a three character, free p, as seen, scraper mysql php, page database, mysql create table sort, mysql populate table, text file scraping, database example, save table xml, date mysql sort, date scraper, can database, create pattern

Acerca do Empregador:
( 1 comentário ) Stafford, United States

ID do Projeto: #163406