Concluído

Scrape court records

I need to scrape public record data. This particular court should be pretty easy as the case id numbers are incremented and displayed in the url. I am open to sugestions on the best way to do this. Ideally it would be done in a code that is not so obscure that I can't work on it if necessary after you deliver. I previously used iopus imacros firefox extension for another project on another court. It is handy because it works in firefox and which allows me to use an ip changing plugin when surfing as that previous website blacklisted my ip pretty quickly. I am open to having a firefox javascript plugin as well. I am open to a server script as well that could run on my host [url removed, login to view] perl, php or other scripting language is fine. If it is on a server though I would hope it is written by someone who knows how to make it defeat the measures such as ip blocking so that this script doesn't quit running after a night. A script that can be run automatically with windows scheduler or a chron job every night would be ideal.

It should be noted that I will be posting another project hand in hand with this. It will use google spreadsheet api to grab the defendant address and use zillow api to get the zestimate for the property address and it insert it back into the spreadsheet.

## Deliverables

I need to scrape public record data. This particular court should be pretty easy as the case id numbers are incremented and displayed in the url. I am open to sugestions on the best way to do this. Ideally it would be done in a code that is not so obscure that I can't work on it if necessary after you deliver. I previously used iopus imacros firefox extension for another project on another court. It is handy because it works in firefox and which allows me to use an ip changing plugin when surfing as that previous website blacklisted my ip pretty quickly. I am open to having a firefox javascript plugin as well. I am open to a server script as well that could run on my host [url removed, login to view] perl, php or other scripting language is fine. If it is on a server though I would hope it is written by someone who knows how to make it defeat the measures such as ip blocking so that this script doesn't quit running after a night. A script that can be run automatically with windows scheduler or a chron job every night would be ideal.

Example [url removed, login to view]

You can see that case id number right in the url. Here are the steps for scraping.

1) load URL

2) check to see if it is a foreclosure. Will say "Type Of Action: Foreclosures-CV" If it is a foreclosure go to the next step. If not increment to the next case do this step again.

3) scrape the following information and put in data columns:

a) date filed

b) case number

c) amount- for this court this will be an empty field but but I want this column included in the saved file because I will be merging info form multiple courts

d) plaintiffs name

e) plaintiffs address

f) plaintiffs attorney

g) name of the first listed defendant

h) full address of first listed defendant

i) Type- this won't be scraped- simply input in this field "Lor Clerk"

j) names of all defendants up to 10 of them (there can be a lot of defendants because of the process of due diligence required) The first 10 defendant names need to be all scraped and concatenated into one string and put in a field.

4) Increment-

a) As you increment the caseid and the cases are very recent the cases will be live but the court staff has not entered the names and addresses of the parties. The script needs to realize this and make some type of flag to come back to each case where the data hasn't been entered every time the script is run for up to two weeks after the filing date. If the data is still not entered in after two weeks then the script should give up on that case and not check it anymore every time it is ran. The script must continue to check for information and increment however after seeing the first case that has missing data because lack of data in one an older case doesn't mean newer cases won't have the data entered in.

b) When you increment past the most recent case you you will get an error "An exception has occured: [url removed, login to view]: There is no row at position 0. at [url removed, login to view]`[url removed, login to view](Int32 userIndex) at [url removed, login to view]`1.get_Item(Int32 index) at System.Data.DataRowCollection.get_Item(Int32 index) at [url removed, login to view]()/[url removed, login to view]"

That would be a cue that you need to stop. However this same error could happen as you increment through the cases. Occassionaly case numbers will be skipped or cases will be aborted and so there is a gap and when you increment through you will get this error even when there are still future cases. The proper way to handle this error would be to increment 10 or 20 cases past the first error to see if any immediately future case numbers are existant. Like 3)a) Also it could be that in the next few days these errored out case numbers could get information filled in. So a flag should be made when errors recieved to come back to the case for up to 2 weeks before giving up.

I will attached pdfs showing examples of errors and the lack of data entry as well as full data entry.

Hear the acceptable outputs in order of what I want most first:

First choice: Use google spreadsheet api to insert into the last row of a spreadsheet that I identify

Second choice: CSV emailed to me

Third choice: CSV saved to my hard drive

It should be noted that I will be posting another project hand in hand with this. This other project is a bit more stringent. It must use google spreadsheet api to grab the defendant address and use zillow api to get the zestimate for the property address and it insert it back into the spreadsheet.

Habilidades: Engenharia, PHP, Gestão de projetos, Instalação de Script, Shell Script, Arquitetura de software, Teste de Software, Hospedagem Web, Gestão de Site , Teste de Website

Veja mais: windows action script, what is data entry clerk, what is a data entry clerk, what do you mean by data entry, use script in google spreadsheet, use case extension, the best cv examples, the best cv example, string j, spreadsheet script google, spreadsheet api javascript, should i quit my job, server scripting language, script google spreadsheet, scheduler position, scheduler in javascript, rbtree, quit job, quit a job, php data entry form example

Acerca do Empregador:
( 34 comentários ) Sheffield Lake, United States

ID do Projeto: #3767964

Concedido a:

quaintek

See private message.

$51 USD em 5 dias
(18 Comentários)
4.3

7 freelancers estão ofertando em média $72 para esse trabalho

tzo

See private message.

$63.75 USD in 5 dias
(254 Comentários)
6.4
devmstech

See private message.

$58.65 USD in 5 dias
(35 Comentários)
5.1
sumon4work

See private message.

$84.15 USD in 5 dias
(42 Comentários)
4.7
glyph

See private message.

$85 USD in 5 dias
(12 Comentários)
4.0
patrunjelu

See private message.

$76.5 USD in 5 dias
(17 Comentários)
3.1
carlosguerrac

See private message.

$85 USD in 5 dias
(2 Comentários)
2.5