Programatically searching a huge database and sorting results
The goal is to find all of the most common strings that precede each word or phrase in domain names. This is NOT a manual data entry job!
We have a very large database of ALL .com domain names and another list containing thousands of unique words or phrases. We need to take each word or phrase from the second list and search the whole domain database to find every instance of that word or phrase ONLY IF it starts within the first 7 digits of the domain name. The goal is to create a list of the most common words that precede each term, in order of most popular to the least popular, and show the quantity of each.
To give you an example, if the original term is “REBATE” (term number 1753) you would search the database of nearly 100 million domain names, finding any domain name that contains the term REBATE starting in the first 7 digits of the number. Then list all of the occurrences sorted by the first 7 digits in order of most frequent to least. So the outcome should look something like this (JUST AN EXAMPLE!)
You also have to be able to open a rar file.
The output should be single text file listing multiple terms, one after another like the above.
We have thousands of terms to search. Our goal is to find who can do this the most efficiently, so we will award this to multiple freelancers asking them each to put in ONE HOUR worth of time. Then we will select whoever completes the most terms (and does it properly) to continue. We may work with more than one or may find one freelancer stands out and give them all the work.
I have a good track record as an employer with Freelancer and an even bigger positive record with Upwork which I’ve used for several years. I’m doing it this way because it’s impossible to tell from someone’s profile how efficient and accurate of a worker someone is just from their profile or talking to them. This will be a lot of work for whoever is best at it. The worst case, if someone else is faster and gets more done, you’ll get a positive review and a completed job.
The first step is to answer some questions.
***No automatic proposals. If you don’t answer these questions fully, you won’t be considered.
What tools would you use to do this?
Have you worked with a database of 100 million records before?
What’s the biggest database you’ve worked with before?
What would be your goal or expectation in: Terms Searched and Sorted / hour?
29 freelancers are bidding on average $25/hour for this job
Hello, how are you? Database manager is here. Your project is clear for me and can be started right now. Please feel free to contact me. Best regards. Burak.
Hello! I provide data collection services, through web scraping and text mining, for data interpretation, comparison, composition, distribution and relationship. Feel free to contact me!