Em Andamento

scrape financial user forum

A phython program and/or web page(s) that scrapes a financial user forum website.

Using: MySQL / Python / PHP / Open to suggestions

The user forum is unique, not php-nuke or vBulletin.

The qualified bidder will have experience parsing html / scraping values from an http response.

Create a program / website – Python preferred but open to suggestions that scrapes a list of web pages (list is predefined in a database table )

The process will work a database table that contains a URL in each row.

The process will make the http request, with a customizable user-agent string.

Next, determine if the http response is a valid response, or an http error, or a redirect to another page.

If the response is the desired response, scrape 10 different elements off of the page. These elements include:

• Topic Description

• Topic (stock) symbol

• The updated topic URL

• The number of followers for the topic

• The Topic Category ID

• The Topic HTML title

• The last DTTM post for the topic

• The number of posts for the topic

• The Moderators for the topic (there could be multiple moderators – each one will need to be recorded) The moderator will have a user_id, a username, and a URL. All will need to be recorded.

After the page elements have been scraped – perform the following action:

Update the original table with the new values (listed above)

For “moderators” elements, the following should be performed:

1. Determine if the USER_ID is already in a USER_TBL. If the user already exists in the table, no action. Otherwise insert a row into the USER_TBL, values USER_ID_NBR, USER_NAME, USER_URL and SYS_ADDED_DTTM.

SELECT ‘X’ FROM FORUM_USER_TBL WHERE USER_ID_NBR = :1

2. Each forum topic has a number. Take the TOPIC_NBR and the USER_ID_NBR and check to see if it exists on the MODERATOR_TBL. If exists, no action, otherwise insert a row into the MODERATOR_TBL values TOPIC_NBR, USER_ID_NBR, SYS_ADDED_DTTM

SELECT ‘X’ FROM MODERATOR_TBL WHERE TOPIC_NBR = :1 AND USER_ID_NBR = :2

3. Some additional logic, when moderators have been scraped from the web page, if they exist on the MODERATOR_TBL for the TOPIC_NBR, but they are no longer on the web page as a moderator, then remove the row from MODERATOR_TBL.

This project has the following requirements and EACH BIDDER MUST ADDRESS EACH ITEM:

(1) The bidder is fluent in English without 3rd party assistance.

(2) Bidder is familiar with and can use [url removed, login to view], [url removed, login to view], GOTOMEETING, SKYPE and google talk/hangouts to work out solution.

(3) The solution MUST BE HIGHLY PORTABLE, meaning it can be moved from one instance to another.

(4) Bidder will include platform requirements in bid. Please state what is required. A simple solution that runs on a hosted VPS account or a vmware instance is preferred, but I am open to detailed suggestions.

(5) Testing and final delivery to be performed on BUYER supplied instance.

(6) Bidder should have experience with full text indexing for future phases.

An additional document with screenshots of the forum are available to bidders after the above items are addressed.

Please do not make a final bid until reading the additional document.

Habilidades: MySQL, PHP, Python, Captura de dados na web

Ver mais: vbulletin requirements, testing user, talk agent, program website python, delivery agent, python html parsing, python forum, vps instance, vps error, user agent, unique user, phython, multiple phases, financial program, text parsing python, php mysql join table, mysql join table, mysql table join, html financial, indexing url, financial php, mysql multiple insert, financial topic, google hangouts python, skype python

Acerca do Empregador:
( 17 comentários ) United States

ID do Projeto: #5113776

Premiar a:

Abhishek92Kumar

Sir, I have a program that can be used to scrape the forums ans it can scrape the exact data you want to have, Tell me more about the website and I can help you in this. the program is really fast and you can use many Mais

$111 USD em 3 dias
(5 Avaliações)
3.2

8 freelancers estão ofertando em média $204 para este trabalho

mantislin

Hi sir, I am scraping expert, I have did too many similar projects, please check my feedback then you will know. Can you tell me more details? then I will provide demo data for you. Thanks, Kimi

$253 USD in 6 dias
(188 Comentários)
6.9
aoefmpes

Hello Sir, i have experience on web scraping, i can wirte script in java or php pl ping me and give me details thanks!

$283 USD in 10 dias
(58 Comentários)
5.5
tikumishra

I have done quite a few flawless Scrapping project. I can develop this for you in PHP and MySQL using PHP CUrl and Simple HTML DOM API. I have Skype A/C and can communicate with Skype. And yes, I can deliver thi Mais

$222 USD in 3 dias
(37 Comentários)
5.2
ils7

Hi. I will develop scrapy (python framework) to scrape data. I hope that urls are points to same domain. I prefer to use linux platform for scrapy projects. I have no experince with LOGMEIN.COM, JOIN.ME, GOTOMEETIN Mais

$421 USD in 14 dias
(12 Comentários)
3.9
Bence4hire

Hi, Scraping expert here. I can write a scraper script for you in Python, using the powerful Scrapy framework. It doesn't have any extraordinary dependencies, runs on most platforms and server setups, but if you Mais

$150 USD in 2 dias
(5 Comentários)
2.8
gopalv84

I have experience in scraping process using perl - Lwp user agent. i can scrape the data and give the output in excel format to load into Database. If need i can load directly into direct database.

$83 USD in 3 dias
(1 Comentário)
0.6
KennyLedet

Sir, I am interested in taking this project. I write crawling/scraping code on a daily basis and know the field like the back of my hand. I am sure I can handle any forum you throw at me! Please send me more detail Mais

$105 USD in 6 dias
(0 Comentários)
0.0