I have a list of 240 million websites I'd like to crawl and want to do that in the most cost-effective /quickest way.
This would likely be accomplished with a web crawler that you would create to categorize the websites it crawls and sorts them into various categories while extracting key information such as. Business name, emails, phone numbers, addresses. I would also like to track the technologies the websites used for example Shopify.
Here is how I would like websites to be processed
1. Determine if the website is in English
2. Determine the category of the website.
3. Determine if the website is for a business
5. Determine the category of business
6. If website is a business scrape the information using NLP (get as many pre-determined fields as possible)
7. Schedule to update information in 3 months.
Please let me know how you'd approach the project(roughly) in your proposal.
31 freelancers estão ofertando em média $3204 nesse trabalho
I am an expert in web searching, I saw the details and I understood well how to execute this task. Please contact me to discuss more, looking forward to work with you. Thanks.
Hi, I have +5 years of experience dealing with machine learning algorithms and worked on multiple projects in this field, Please contact me to discuss more. Have a nice day