Cancelado

Perl Dataminer - HTML parsing, HTML post/get queries, Mysql databases, PDF creation

This is a datamining application some of which is already finished.

The dataming is for information reguarding address records in the USA.

Bonuses will be giving for useful/pretty reporting, and data mining realistic rent from a source I have been unable to find.

Skills required include, HTML parsing, HTML post/get queries, Mysql databases, PDF creation.

The acceptable error rate is 1 in every 500 records.

That is to say, 1 in every 500 records is allow to have data import errors.

the other 499 records are required to have 100% data correctly data mined. This program will be used to data mine 500 new records each day.

Must work with Strewberry Perl on Windows 7.

Any Addtional Modules\libs used but be installable by the ppm install command

OR must be installable by you providing them in a ZIP/RAR file.

No Build or Make Commands allowed in the install document.

## Deliverables

This is a datamining application most of which is already finished.

The dataming is for information reguarding address records in the USA.

Bonuses will be giving for useful/pretty reporting, and data mining realistic rent from a source I have been unable to find.

Skills required include, HTML parsing, HTML post/get queries, Mysql databases, PDF creation.

The acceptable error rate is 1 in every 500 records.

That is to say, 1 in every 500 records is allow to have data import(data mining) errors.

the other 499 records are required to have 100% data correctly data mined.

Must work with Strewberry Perl on Windows 7.

Any Addtional Modules\libs used but be installable by the ppm install command

OR must be installable by you providing them in a ZIP/RAR file.

No Build or Make Commands.

--- Please see attached zip file ---

command line sytax.

perl foredownloader

-- this option causes the program to download the data since the last time it was ran.

perl foredownloader 10-23-2010 10-26-2010

-- this option causes the program to download the date range 10-23-2010 to 10-26-2010

perl foredownloader 10-23-2010 10-26-2010 always

-- This option cause the program to download with out the confirm response

downloads the list from

-- Search By 'Document Type' - Left hand side

[url removed, login to view]

-- Use types "HL,L,LISP,DEF,B,NTS,TSD,TXDUE,DETS,BETS" / Foreclosure Documents

[url removed, login to view]

-- Click "Create Export File"

The exported file will provide the following data points

push(@headers, 'DocumentID');

push(@headers, 'CrossPartyName');

push(@headers, 'Consideration');

push(@headers, 'Comments');

push(@headers, 'DocTypeKey');

push(@headers, 'FullName');

push(@headers, 'RecordDate');

push(@headers, 'ClerkFileNumber');

push(@headers, 'DOR1ParcelID');

push(@headers, 'Comments2');

-- If there is a error that the date range is to large(IE that we have tired to download over 10,000 records), the program should automatically divied the date range

until it sucessful, It should download all such sections and reassable them.

After downloading the List from [url removed, login to view]

it should show the total number of records about to be downloaded, and request a confirm to start downloading.

The program should take the Parcel ID information from the exported excel file and downloads the information from the Assessor website.

The program should also add data fields for any URL ref from the Assessor website, and also a data feild for the Assessor website itself.

Example URLs

[url removed, login to view]

[url removed, login to view]

[url removed, login to view]:05188

[url removed, login to view]

[url removed, login to view]

and the following data points

# GENERAL INFORMATION

push(@headers, 'Assessor URL');

push(@headers, 'Parcel NO.');

push(@headers, 'OWNER AND MAILING ADDRESS');

push(@headers, 'LOCATION ADDRESS CITY/UNINCORPORATED TOWN');

push(@headers, 'ASSESSOR DESCRIPTION');

push(@headers, 'ASSESSOR DESCRIPTION URL');

push(@headers, 'RECORDED DOCUMENT NO.');

push(@headers, 'RECORDED DOCUMENT NO. URL');

push(@headers, 'RECORDED DATE');

push(@headers, 'VESTING');

# ASSESSMENT INFORMATION AND SUPPLEMENTAL VALUE

push(@headers, 'TAX DISTRICT');

push(@headers, 'APPRAISAL YEAR');

push(@headers, 'FISCAL YEAR');

push(@headers, 'SUPPLEMENTAL IMPROVEMENT VALUE');

push(@headers, 'SUPPLEMENTAL IMPROVEMENT ACCOUNT NUMBER');

#REAL PROPERTY ASSESSED VALUE 1

push(@headers, 'FISCAL YEAR 1');

push(@headers, 'LAND 1');

push(@headers, 'IMPROVEMENTS 1');

push(@headers, 'PERSONAL PROPERTY 1');

push(@headers, 'EXEMPT 1');

push(@headers, 'GROSS ASSESSED (SUBTOTAL) 1');

push(@headers, 'TAXABLE LAND+IMP (SUBTOTAL) 1');

push(@headers, 'COMMON ELEMENT ALLOCATION ASSD 1');

push(@headers, 'TOTAL ASSESSED VALUE 1');

push(@headers, 'TOTAL TAXABLE VALUE 1');

#REAL PROPERTY ASSESSED VALUE 2

push(@headers, 'FISCAL YEAR 2');

push(@headers, 'LAND 2');

push(@headers, 'IMPROVEMENTS 2');

push(@headers, 'PERSONAL PROPERTY 2');

push(@headers, 'EXEMPT 2');

push(@headers, 'GROSS ASSESSED (SUBTOTAL) 2');

push(@headers, 'TAXABLE LAND+IMP (SUBTOTAL) 2');

push(@headers, 'COMMON ELEMENT ALLOCATION ASSD 2');

push(@headers, 'TOTAL ASSESSED VALUE 2');

push(@headers, 'TOTAL TAXABLE VALUE 2');

Push(@headers, 'Teasurer Property Taxes URL');

#ESTIMATED LOT SIZE AND APPRAISAL INFORMATION

push(@headers, 'ESTIMATED SIZE');

push(@headers, 'ORIGINAL CONST. YEAR');

push(@headers, 'LAST SALE PRICE MONTH/YEAR');

push(@headers, 'LAND USE');

push(@headers, 'DWELLING UNITS');

#PRIMARY RESIDENTIAL STRUCTURE

push(@headers, 'TOTAL LIVING SQ. FT.');

push(@headers, '1ST FLOOR SQ. FT.');

push(@headers, '2ND FLOOR SQ. FT.');

push(@headers, 'BASEMENT SQ. FT.');

push(@headers, 'GARAGE SQ. FT.');

push(@headers, 'CARPORT SQ. FT.');

push(@headers, 'STORIES');

push(@headers, 'BEDROOMS');

push(@headers, 'BATHROOMS');

push(@headers, 'FIREPLACE');

push(@headers, 'ADDN/CONV');

push(@headers, 'POOL');

push(@headers, 'SPA');

push(@headers, 'TYPE OF CONSTRUCTION');

push(@headers, 'ROOF TYPE');

#ASSESSORMAP VIEWING GUIDELINES

push(@headers, 'MAP');

push(@headers, 'MAP URL');

The program should then download all the data points from Teasurer Property Taxes URL Example [url removed, login to view]

## List of data Points from the Teasurer Website not listed here, but download them all

The program should then use a FREE Geocoding Service which allows at least 10,000 records to be geocoded per day.

Any recorded not abled to be geocoded during that day should beable to be geocoded later by running the command

perl foredownloader fixgeo

the Geocoding should provide at least the following data points

#from geocode

push(@headers, 'Geo_Number'); <-- Street Numbers

push(@headers, 'Geo_Street'); <-- Street Name

push(@headers, 'Geo_Type'); <-- Street Type (Circle, Ave, Blvd, St.) etc.

push(@headers, 'Geo_City');

push(@headers, 'Geo_State');

push(@headers, 'Geo_Zip');

push(@headers, 'Geo_Suffix');

push(@headers, 'Geo_Prefix'); <-- such as North, S. E.

push(@headers, 'Geo_Lat');

push(@headers, 'Geo_Long');

Should include a data field for URL of Google Maps for each Address

Example [url removed, login to view]+Pepper+Tree+Cir+Henderson,+NV+89014&sll=[url removed, login to view],-

115.172816&sspn=[url removed, login to view],1.244202&ie=UTF8&hq=&hnear=635+Pepper+Tree+Cir,+Henderson,+Clark,+Nevada+89014&z=16

The program need to download the following from [url removed, login to view] For Each Address.

# From epprisal

push(@headers, 'Eppraisal');

push(@headers, 'Zillow_apprasial');

Should also download the data for Recently Sold Homes (all 5 of them)

Address,Sales Price,Sale Date,Bed/Bath,Sq. Ft.

#### DATABASE WORK ####

All the data should go into mysql, with a timestamp for the Query which importanted it.

there should be a [url removed, login to view] file to hold the configuration values.

Records need to be important multiply times, each time with a different importID and timestamp.

When a record for lets say Parcel=191-24-111-040 is important on Oct 10th

it should not over write the record important earlier on Oct 2nd.

database work also needs to have a [url removed, login to view] file which will create all the needed database tables

also needs to have a [url removed, login to view] file which will prompt the user to confirm they really wish to delete database tables, and then deletes them.

### Bonus ###

Up to 20$ USD bonus will be given for useful reporting.

such as looking to see which multi family homes with 4 units where built between 1998 and 2010, with an Eppraisal between 65,000 and 200,000

The better looking the reports the better

Using a background image(same background for each page) and then creating an Mulitpage PDF file, one page per Address is prefect.

### Addtional BONUS ###

Up to an 20$ USD bonus will be given if a realistic suggested rental price can be generated/data mined for each address.

I need to get realistic rent I can charge if I were to buy a property, should take things into account such as properties type(House, Condo, Appartment), # of bedrooms

and bath rooms, SQ feet, etc.

The more realistic the Suggest rental price is the closer to 20$ USD you will get.

Habilidades: Microsoft, MySQL, Instalação de Script, Shell Script, Área de trabalho do Windows

Ver mais: website html download, use tree data structure, types tree data structure, types data structure, tree types data structure, tree range, tree query, tree program data structure, tree data structure using, tree data structure types, tree data structure example, tree database structure, get tax file number, time residential, things needed creating website, source buy land, rent pdf, rent mysql database, rental line, range tree, range search tree, range query, primary modules, post properties, perl get

Acerca do Empregador:
( 1 comentário ) United States

ID do Projeto: #2965786