This is NOT as easy as it sounds. Don't be fooled - if you dont know what you're doing, don't apply.
I want a small application written that can examine two (LARGE) text files and find identical lines of matching text.
This is NOT a 'text difference' application, this is a text SIMILARITY tool.
For instance, if I have a 1GB text file (File A) and I want to compare it against a 500Meg text file (File B) - I need this application to list the parts of each file that are identical.
1) You will need to be able to select the line length, i.e. 20 chars, 15 chars etc, and it must then try to find ANY 20 characters in File A that are also present in File B.
2) options for case sensitivity, whitespace removal, carriage returns etc - I dont want an identical line to be missed simply because it was split by a carriage return or had an extra formatting tab or space in it.
1) I dont care what language you write this in
2) The application MUST BE FAST!
3) If you can use multi core multi threads even better.
results must be legible.
4) It must work properly. No repeated results listing the same string 1000 times because your algorithm doesn't work. Similarly no missing out results either please
Sounds easy huh :-)
Best of luck.