I am looking for a simple data extractor from web ([login to view URL]).
This portal provides information different past sports events. Within the scope of this project is only soccer.
My preference is to have it done in Python (2.7.x), my platform is Windows 7. No GUI is needed, the execution of the script should be driven by command line parameters and simple text configuration file. I believe the task is not difficult for someone with experience in web scraping and I do know exactly what I am looking for. The script should be able to work both behind proxy and when connected to Internet directly.
As for the processing logic:
- there will be an input config file (text file, one string per line), each line will contain a character key
- for each key following page will be extracted, e.g. "S0L7YmZa" (all the text except from the key is static)
- [login to view URL]
- and for each line in the table relating to key events (goals, yellow/red cards, substitutions, etc.), one line into an output file will be created
- exact fields to be specified but all the required data are visible on this page
Hope this all makes sense.
A few things for fine tweaking of the process to be discussed during the project but high level these are the most important tasks.
Hi,
I am an expert in python and web scraping. I use libraries like mechanize, urllib2 etc.. I have almost 6 years in web scraping in python. I have done projects which can scrap information from many odds portals etc.. I can provide you a sample scripts as well.
I will take little extra because of my quality work. I will provide a multi threaded application which can run with proxies as well. Please check my feedback you will be impressed. Looking forward to talk to you. We can finalise on the price once full specifications are known to me.
Thanks,
Gopinadh.
Hello, my name is Dmitry.
It will be fair to tell you that actually I am rather strong C++/C#/OpenGL developer, but had almost no experience with web. But currently I am diving into Python and Web (found it somehow interesting, quite relaxative after high-math tasks as of on-site worker).
So in short words here is my proposal: I'll try to do it as much better as I can (since I want to find out how to provide best solutions there and investigate topic), but it could take slightly more time than for other freelancers (still I believe that I'm very fast learning and all would be fine). Since it's python 2.7 I will try to do this using scrapy/beuatifulsoup
Price is negotiable based on others' freelancers bids, but guess it couldn't get lower for non-asians.
So if after reading this you consider me as a candidate, please contact me.
Hi,
I'm expert and experienced in Python web scraping. I've done numerous similar projects. I went through your project description and the website. I'm confident I can do this and deliver professional solution.
If you can provide me with a sample set of keys and fields required then I can provide a sample result before awarding the project
Please contact me for further detail discussion. Looking forward to hear from you.
Thanks.
i have had immense experience developing Web extraction and parsing python utilities.
I believe i can complete the requested tool within days.
However,i do wish to receive further information of the information to be stored in the log file(is it only the actual information that appears in the website,as per each event(such as goal,yellow card and as such)?
also,please ellaborate on the need to work with proxy(do you want the information to be passed through a proxy?)
regards,
Hi again.
Posted this bid accordingly you last message from "Web Scrapping of Soccer Data" project.
do you accept perl? for scraping I am using it more often then python, thoug python also is possible if you decide.
I think you will give little more exact instructions if/when we start project - I mean exact fields which you will want.
regards.
I have prior experience on projects of a similar nature, much of the existing code can be reused to speed up development time.
Feel free to ask me any questions before you decide
Proposed Milestones:
Create a program that scrapes a sports website and stores the data.
Note:
The milestone payment will need to be made before work commences and the payment released upon delivery of the program.
Hi There,
I am a Python dev who's worked with web and API scraping a lot - I am currently working on a similar sports application myself, so I should be able to put something together that suits what you're after quite quickly.
Honestly, I'm new to this site, hence the lack of feedback. I'd be more than happy to have a conversation with you, and am sure you'll be pleased with results.
Look forward to hearing from you!
Hello,
I'm student of computer engineering and i'm currently working as freelancer.
I love the scraping world, and have just finished another proyect of scraping. Also for sports :)
Message me, maybe we can chat a bit
I'd love to help you with this one. I know i'm pretty new in the website, but you can check the comments of my satisfied costumers!
I won't finish my work, until you're completely satisfied
Greetings!
If i properly understood you will provide the config file with the keys in one line.
so i will red them line by line and retrieve some data from the portal aprorpiately.
The question: do you want to have 1 output file for all keys or separate?
I can do this for you. Tomorow will try to send you the test version.
Regards
Hello,
I'm a senior software engineer which is trying to start a career as a freelancer.
I've already done this kind of job in the past so it would be a piece of cake for me. I also had to support proxies.
In order to finalize requirements, we can also have a skype call.
Kind regards,
Laurent Carlier
Hi,
I am new to this platform and this would be my first project here.
The task seems to be approachable with standard modules and should not take long time to deliver.
Looking forward to a successful collaboration!
Cheers,
Vitalie
I am a new grad developer and have experience working with python and scraping web content.
For this project, my proposed strategy is:
1. Use python scrapy to programmatically get content of html page from web of science for search queries.
2. Scrapy also supports logging in to the website in an easy way. Although this would require per website implementation tuning.
3. The html content is then parsed and relevant fields are extracted (goals, yellow/red cards, substitutions, etc.) and add to the output csv file.
4. The python script will accept the input file and desired output filename as command line arguments.
I really like doing web scraping work. You won't regret choosing me.
I am only beggining my freelancer life. I would like to have the opportunity to deliver you the software you need and get momentum going for me.
Cheers!