Em Andamento

8 simple scrapers needed

MUST follow the coding instructions laid out below (no deviations or substitutions).

I have attached sample data and details for the 8 sites to scrape. The scraper definition is also attached so you can see proper formatting for JSON.

note: I will have many more of these for developers that perform a good job in a timely and cost-effective manner.

Thanks,

Scott

Scraping Specs

- Written in Ruby, NO TABS (2 spaces instead).

- Run from the command line taking two arguments - the first should be an integer for the scrape ID, the second should be the URL for the VENUE where the scrape starts:

./[url removed, login to view] <ID:integer> <URL:string>

./[url removed, login to view] 111 [url removed, login to view]

- Must use Curl for GET-ing URLs

GEM: curb

- Must only use standard Ruby regex for parsing, OR hpricot OR nokogiri as an alternative

GEM: hpricot

GEM: nokogiri

- Must output JSON as a finished product, sample data included below

GEM: json

- Must *NOT* use any other GEMS outside of these three: curb, hpricot, nokogiri, json

- The script should return only 1 of 2 things formatted in JSON. Either an ERROR, or the actual data if everything works.

- If there is any kind of error, it needs to output json as defined with a specific error code and message, or at least the standard error code and message:

{"scrape": {

"id": <SCRAPE_ID_FROM_INITIAL_ARGUMENT_1>,

"url": "<URL_FROM_INITIAL_ARGUMENT_2>",

"success": <BOOLEAN: true/false>,

"error": {

"code": <VALID_ERROR_CODE>,

"description": "<TEXT_WITH_WHATEVER_ERROR_MESSAGE_YOU_WANT>"

}

}

VALID ERROR CODES ARE:

10: (Generic error of any kind)

20: (URL GET error - any error involving GET-ing a URL)

30: (PARSE error - any error involving parsing the data)

SAMPLE ERROR RETURN:

{"scrape": {

"id": 111,

"url": "http://foo.com/calendar",

"success": false,

"error": {

"code": 10,

"description": "Problem doing something in the foo function."

}

}

- If it succeeds, it needs to output json as defined with at least the REQUIRED following data in proper format:

{"scrape": {

"id": <SCRAPE_ID_FROM_INITIAL_ARGUMENT_1>,

"url": "<URL_FROM_INITIAL_ARGUMENT_2>",

"success": <BOOLEAN: true/false>,

"events": [

{

"title": "<STRING: Name of the event REQUIRED>",

"start_date": "<DATE: date of the event, or date the event starts (MM/DD/YYYY) REQUIRED>",

"start_time": "<DATETIME: date/time the event starts in *24 HOUR LOCAL TIME* (MM/DD/YYYY HH:MM) OPTIONAL>",

"end_date": "<DATE: date the event ends (MM/DD/YYYY) OPTIONAL>",

"end_time": "<DATETIME: date/time the event ends in *24 HOUR LOCAL TIME* (MM/DD/YYYY HH:MM) OPTIONAL>",

"repeating": <INTEGER: 0 if the event happens once, 1 if the event repeats weekly REQUIRED>,

"repeats_on": "<STRING: *full* name of the day of week the event repeats on (Thursday, Friday, etc.) OPTIONAL>",

"repeats_until": "<DATE: date the event repeats until (MM/DD/YYYY) OPTIONAL>",

"image_url": "<STRING: url for an image associated with this event OPTIONAL>",

"ticket_url": "<STRING: url to buy tickets for this event OPTIONAL>",

"ticket_prices": "<STRING: descriptional text about the ticket price OPTIONAL>",

"description": "<STRING: any freeform descriptive text about the event OPTIONAL>",

"bands": [

{ "name": "<STRING: band name>" },

{ "name": "<STRING: band name>" }

]

}

]

}

SAMPLE DATA:

{"scrape": {

"id": 111,

"url": "http://foo.com/calendar",

"success": true,

"events": [

{

"title": "2$ off Lone Star!",

"start_date": "01/01/2010",

"repeating": 1,

"repeats_on": "Tuesday",

"repeats_until": "01/01/2011",

"image_url": "http://pictures.com/of/lone_star.jpg",

},

{

"title": "Rock Your Mom's House",

"start_date": "01/10/2010",

"start_time": "01/10/2010 19:00",

"end_time": "01/10/2010 22:00",

"repeating": 0,

"image_url": "http://yourmoms.com/house.gif",

"ticket_url": "http://buytix.to/yourmoms",

"ticket_prices": "$8.00 all ages",

"description": "These people really know how to stick it to you.",

"bands": [

{ "name": "Buttcheeck Falcons" },

{ "name": "Foo Fighters" }

]

}

]

}

NOTES:

- All TIMES / DATETIMES should be in the LOCAL TIME of whatever VENUE is being scraped. Usually this will just be the time that you're scraping, but BE SURE.

- ALWAYS return a valid error code if anything goes wrong. Even if it's just the generic error message.

Habilidades: Ruby on Rails, Captura de dados na web

Ver mais: ruby scrape json, nokogiri json, ruby scraper, web developers job description, two string problem, times standard, ticket alternative, the string problem, thanks you notes, string standard, string problem, standard string, ruby rails developers, ruby price, ruby on rails web developers, ruby on rails second job, ruby on rails gems, ruby on rails cost, ruby on rails calendar, return ruby on rails, regex is, regex in c, regex codes, regex c, rails job

Acerca do Empregador:
( 4 comentários ) austin, United States

ID do Projeto: #580399

Premiar a:

JoeCannatti

I can do this with no problem.

$200 USD em 5 dias
(1 Comentário)
2.6

7 freelancers estão ofertando em média $276 para este trabalho

taro

Can implement

$500 USD in 8 dias
(34 Comentários)
6.7
srinichal

I have handled many scrapping projects successfully and can deal with this project to your satisfaction

$234 USD in 6 dias
(30 Comentários)
6.2
shreesoftech

Dear sir, i have studied that site. and i can do this. please go through [url removed, login to view] and find my contact information. Thanks and happy new year!

$300 USD in 10 dias
(5 Comentários)
3.2
sumeet00

Hi, Check PM. Thanks, Sumeet.

$200 USD in 5 dias
(1 Comentário)
2.6
jcfreelance

Can be done. See my PMB!

$250 USD in 5 dias
(1 Comentário)
1.9
XIGAsource

Dear Sir, We can do it perfectly for you. Thanks!

$250 USD in 4 dias
(0 Comentários)
0.0