I need to do some data mining and I need a couple scripts written I have about 250 .CSV files that I need compiled into one database but I need to mix and match the information as to not lose or duplicate any data.
I do not want to lose a single piece of data, I want to use these specific fields for the mining process. Also for instance if one of those files has “name” “email” and “address” but the master base has more data I want the excess data transferred to the new base (Stage One). However if say for instance the email matches but the “name” & “address” are different I want to put that into a 3rd database (Compare) so I can further mine the data. Also I want datbase (stage one) to have a field at the end called “Conflicting Data” and in the field I need it to be populated with the name or names of fields conflicting. I will later use the 3rd base to (Compare) to compare the conflicting data based upon data to keep the most current data. I feel this would be a good way to do it but if you have suggestions I would like to hear them.
Thank you