I am looking for a data extractor from web ([login to view URL]). This portal provides information about odds on different sports, within the scope of this project is only soccer - please refer to [login to view URL] for the first information.
My preference is to have it done in Python (2.7.x), my platform is Windows 7. No GUI is needed, the execution of the script should be driven by command line parameters and simple configuration file. I believe the task is not difficult for someone with experience in web scraping and I do know exactly what I am looking for. As an optional requirements: the script should be able to process more pages in parallel and should work both behind proxy and when connected to Internet directly. The site allows you to register (freely) and when registered more data can be seen (odds from more bookmakers), the script should log to the site as well.
As for the processing logic:
- there will be an input config file (text file, one string per line)
- there will be two output files (matches/odds) for each processed line from the config file
- for each line (assume the line="soccer/australia/a-league-2013-2014/") in the config file the script should go to corresponding page [login to view URL]
- extract one line per each match into output file for matches: for the first match, something like "soccer/australia/a-league-2013-2014,04 May 2014,Play Offs,08:00,Brisbane Roar,WS Wanderers,tII15doo,2,1,ET,2.60,3.40,3.55,52"
- extract one line for each odds for 1X2 and Over/Under +2.5 markets for each match - for this you need to go to
- [login to view URL];2 -> per each set of odds line like this will be outputed "tII15doo,1X2,2,10Bet,2.10,3.25,3.20"
- [login to view URL];2 -> per each set of odds line like this will be outputed "tII15doo,over-under,2,2.5,10Bet,2.20,1.61"
Hope this all makes sense.
A few things for fine tweaking of the process to be discussed during the project but high level these are the most important tasks.
Hi.
Seems that I did smth similar few weeks ago. Let me know if you are interested and I will send a sample output of my work(an excel file).
Waiting for your response.
I have a bachelor in Computer Science from the American University in Cairo and a minor in Mathematics, with 10+ years of experience with hands-on programming. I have worked for the past year in Microsoft's Advanced Technology Lab in Cairo (ATLC). I have a 2+ years of experience in web scraping with Python using BeautifulSoup, Requests and Selenium Webdriver. Check my previous projects for past feedback.
Hello. More 20 years programming experience.
Do you insist on python? as alternative I can suggest perl, though python also possible.
Regards.
---------------------------------------------------------------------------------------------------------------------------------------------------
Hello,
I have been developing web scrapers in Python for the past 4 years using libraries such as BeautifulSoup or lxml or using the Scrapy framework .
Most of the past scrapers were for e-commerce websites in order to compare product features and prices.
I would build the scraper using the scrapy framework for python. The scraper would read each url provided in the text file and scrape the required information. Because the website uses javascript calls to get the data, a method t circumvent the will have to be established. The output of the scrape will be 2 csv files, although an out put of 3 files is also possible: matches, odds_1x2 and odds_over_under.
Best regards,
Andrei Chelaru
Hi,
I am used to build scrapers, yellowpages, yell, yelp, ...
I use beautifulsoup, selenium, or scrapy for Python but i also do the same with casperjs and phantomjs.
Please consider answering me and i will provide you with a sample data in the next 24 hours.
Regards
Ali
I write all my web-scrapers in python 2.7 (for linux command line) but it works for windows aswell.
I can also write the program threaded if needed (for fast scraping)
I would like to talk to you about your requirements more in-depth aswell.