Em Andamento

RSS Feed Parsing using Regular Expressions

We have a list of about 2000 RSS Feed URLs that were gathered from the Internet. We are looking to engage a qualified candidate to create a set of instructions in a predefined format (instructions to be provided later) that will allow us to extract various key pieces of data from RSS feed articles.

For example, for each RSS article inside an RSS feed that provides a list of events in New York City, we would like to know when an event starts, ends and where it is taking place. Since different RSS feeds contain this type of information in different layouts, this process needs to be done for every RSS feed in our list. The list of exact fields that we would like to extract is as follows:

• title

• summary text

• full text

• start date

• start time

• end date

• end time

• venue_name

• picture_url

Output:

A set of instructions in a predefined format for each processed RSS feed. Here is sample of the final output we are looking for:

harvest_instructions: { title: {feed: :title, regex: '(.+)\s-\s.{3},'},

description_clean: {feed: :summary, regex: '(?:\w+, \w+ \d\d?,? \d\d\d\d,?.+- \d.+\d{5}\.)(.+)'},

start_date: {text: '.eventWhen', regex: 'When: (\w+, \w+ \d\d?, \d{2,4})'},

start_time: {text: '.eventWhen', regex: 'When: (?:\w+, \w+ \d\d?, \d{2,4}) (\d\d?:\d\d [A|P]M)'},

end_date: {text: '.eventWhen', regex: 'When:.+- (\w+, \w+ \d\d?, \d{2,4})'},

end_time: {text: '.eventWhen', regex: 'When:.+- (?:\w+, \w+ \d\d?, \d{2,4}) (\d\d?:\d\d [A|P]M)'},

venue_name: {text: '.eventWhere', regex: 'Where:\s(.+)\d*.+\d{5}'},

picture_url: {src: '.eventPic', img}

})

Verification:

We will manually test each set of instructions and evaluate the results.

Skill Requirements:

Medium level of knowledge of Regular Expressions (Regex)

Basic knowledge of HTML and CSS

Understanding of RSS and general web concepts

Knowledge of HTML and CSS

Experience working with Firefox's Firebug and/or Developer tab in Chrome or Safari in order view web page source for the purpose of data scraping

Payment:

We are looking to release batches of 100 RSS feeds at a time for parsing. Payment will be based on a fixed amount per each parsed feed with a 20% bonus once each batch of 100 RSS feeds is parsed.

We are looking for serious candidates who can work on this task in the next 2-3 weeks on a full time basis.

Habilidades: CSS, Processamento de dados, HTML, Captura de dados na web

Ver mais: web developer skill set, web developer skill requirements, web developer new york, using regular expressions, using expressions, sample regular expressions, regular expressions list, regular expressions example, regex example, firebug web developer, example regular expressions, css img html, basic skill test, text parsing, expressions, basic order taking, chrome html source, regular basic, html feed, extract css, web data extract firefox, feed full article, html img, urls chrome, safari css fixed

Acerca do Empregador:
( 0 comentários ) Toronto, Canada

ID do Projeto: #5122051

Premiar a:

soevering

Hi, I have over 10 years of experience in web scraping, working for an online travel agency. I scraped the calendars of many known providers. I think I can do the project in 18 days in the structure you prefer. Mais

$1333 CAD em 18 dias
(1 Comentário)
2.4

9 freelancers estão ofertando em média $1224 para este trabalho

SigmaVisual

Dear Client, I can help in your project. We have already experience of working on similar projects. Please see below to get idea of our experience: Amazon/Ebay Bots: http://sigma-dns.sigmavirtual.com/PDemo1/Am Mais

$773 CAD in 5 dias
(72 Comentários)
6.8
srinichal

Thanks for the invite and I like to deliver the project to your specifications and needs ....

$1184 CAD in 6 dias
(47 Comentários)
6.5
Brsoft1

Hello Greetings !! Please provide me the complete information about this project OUR PAST WORK: http://bidding.theschooltime.com/UploadFiles/UploadFiles/php.pdf Please message us we need to discuss abou Mais

$1402 CAD in 30 dias
(18 Comentários)
5.8
creatorul

Hello, Do you want a desktop tool or web application? Regards, Daniel

$1526 CAD in 12 dias
(11 Comentários)
5.5
linuxstudios

Hi, I am a real person, not a bot. I am located in California and my normal working hours are 9:00a - 6:00p PST M-S. I do currently have availability to take on a project such as yours. I specialize in web scrapi Mais

$1500 CAD in 14 dias
(4 Comentários)
3.4
mtavartkiladze

Hello, I have done many scraping script and I can help you with your project for sure. Please contact me if your interested.

$1444 CAD in 3 dias
(5 Comentários)
3.1
nikhiltechnology

Hi I understand the XML RSS feed and have done various RSS feed extraction software(both web & desktop based). I will develop this software on vb6.0 and MySQL (database to store the results).You can send me few RSS Mais

$773 CAD in 2 dias
(3 Comentários)
1.8
LeonSwinkels

Dear sir/madam, My name is Léon Swinkels and I am a senior developer skilled in multiple technologies. The project you have described seems very interesting to me. I would propose building a system in either PHP Mais

$1368 CAD in 21 dias
(0 Comentários)
0.0
gj20021546

i have been doing a job mainly in analyzing and testing web pages, Regular Expressions is my good tool, I am also good at data searching on the internet with coding methods. Please choose me , i will offer you prof Mais

$1111 CAD in 7 dias
(0 Comentários)
0.0