Process 100 million lines of domains and output the UNIQUE lines.

We have a list of domains about 100,000,000 in total, I imagine about 50% are duplicates. We need to process the entire list and remove the duplicates.

The final output should be a list of UNIQUE domains..

I have been processing using EMeditor for the past 48 hours on a an i7 PC with 16GB of RAM, and it's still no where near finished.

We need some massive power to process this.

Please do not bid unless you have worked with data of this size before.


Habilidades: Processamento de dados, Excel, Microsoft SQL Server , MySQL, PHP

Veja mais: massive data, list domains, remove duplicates mysql, 100 million, mysql output, output data mysql, list domains php, process assigned unique numeric consecutive process, vba outlook process lines email body received, process garments merchandising input output bangladesh, vbscript find running process output text, monstercom million unique users job postings 2008, unique process ludhiana, management information systems process output input amway, need million please, unique landing pages domains, million unique traffic, million data, xpcom process output

Acerca do Empregador:
( 138 comentários ) Lilydale, Australia

ID do Projeto: #6841247

44 freelancers estão ofertando em média $147 para esse trabalho


I can remove duplicates from 100 million lines of domains list and output the unique lines in sql or text format. I will complete this work in 2 days. Looking for your reply to start this work immediately.

$79 USD in 2 dias
(794 Comentários)

hi I'd like to help you on this. I have had to accomplish similar goals before and am ready to provide you with a custom, repeatable process in the form of a bespoke application that will run utilising all the power y Mais

$389 USD in 3 dias
(87 Comentários)

Hello, This is vaishali from \"Hire WordPress Experts\" and I am here to help you with Process 100 million lines of domains and output the UNIQUE lines.. We have gone through the information provided by you. and I Mais

$99 USD in 25 dias
(108 Comentários)

Hello, removing the duplicates won't take more than 48 hours. Regards.

$200 USD in 3 dias
(80 Comentários)

Hello I can do it Please provide me the domains list. ......................................................... Best Regards Bill Lee

$100 USD in 3 dias
(57 Comentários)

Hello sir,I have 8 member team and good experience and we can start the work right now and also all communication and work will be high quality. all work will be on my office without any [url removed, login to view]

$39 USD em 1 dia
(117 Comentários)

Hello Sir, I can create this list for you but it will need different approach than you were using. Check my profile to see that people who worked with me are extremely satisfied with results and speed. I have 100% comp Mais

$99 USD in 3 dias
(25 Comentários)

Hi there, I'm exert in Database Management. I can do this. Please PM me for further discuss. Thank you, FARZANA PINKY.

$222 USD in 2 dias
(75 Comentários)

Hi Your project seems very interesting project. I am ready to start immediately. Few similar project I have accomplished recently : 1. More than 1200 nutrition entries are obtained, entered and properly forma Mais

$277 USD in 15 dias
(15 Comentários)

Hi, I'm an expert in database and data processing with very good feedback and completion rate. I'm very interesting in your project and willing to do it for $100. I used to process big databases up to 24 millions re Mais

$100 USD in 3 dias
(45 Comentários)

What is the file format of the data ? I propose a custom 3 step approach: 1. filter data, separate files for each letter; 2. sort alphabetically each fille; 3. process each file and keep unique records. Filtering is f Mais

$149 USD in 5 dias
(208 Comentários)

A proposal has not yet been provided

$200 USD in 3 dias
(30 Comentários)

Hi there, actually i've worked with nearly a billion of records, with this file, which kind of file do you have. If it's CSV file, it will be great, i will use to import into oracle database and use special script to r Mais

$250 USD in 3 dias
(27 Comentários)

Hello My name is Jay. I can see why the process is taking very long. EMeditor, like offline text editors, must compare lines character by character. We can greatly improve the speed of the search by using an inde Mais

$170 USD in 3 dias
(11 Comentários)

Hi. You don't need massive power to process this. You just need a good technique. The first step is to split the list in smaller parts and then process it. Using database tools, not editors etc. I think one day will Mais

$55 USD in 2 dias
(24 Comentários)

Hello, I think its can be done by split file on 10-100 parts, when sort and make uniq it with unix tools. They, join in together and try to process. Or better way, do pre-processing by continuously read line by line Mais

$30 USD in 3 dias
(19 Comentários)

Hi, I am experienced systems administrator, I worked for a company processing large amounts of similar data (traffic logs for telco) - I performed analysis and reporting of that data in both databases and flat files. W Mais

$80 USD in 3 dias
(8 Comentários)

Hi, I have a server with 2 x Xeon processors and 128 GB ram. The server is running linux which is much more efficient. I have worked with very large data before and this won't be a problem. Below is a little bit a Mais

$200 USD in 2 dias
(9 Comentários)

I have experience to work with 20 million rows, I have dedicated server located in texas, I can process your data within 24 hours, I have many alternative way to process data. I will not demand any penny if I failed to Mais

$50 USD em 1 dia
(8 Comentários)

Just tell me that where these "100 million lines of domains" resides. Either these are in a text file or any other format (please specify). I will make a program / macro that will read this file and remove the duplica Mais

$250 USD in 3 dias
(17 Comentários)