Objective of program:
to scour the web one domain at a time to detect if it has a classifieds site hosted there
It must be able to run on a console on a linux machine so options include c, c++, perl, python. and a few others.? I am not sure which one would be the best and fasted choice.? I DO NOT want a windows version of this.? I want to run it on a linux ? server somewhere with a fast internet connection for the program to function? as quickly as possible
To compile a complete list of all? buying and selling classified sites, message boards, forums, and bulletin boards? on the internet.? NO blogs.? If you goto [url removed, login to view] and search for "miami classifieds" or "free classifieds" those are good examples of the sites i want to gather with this program.
The program will do the following:
Increment alphabeticly and numericly one letter and number at a time to search for the presence of a [url removed, login to view] [url removed, login to view] [url removed, login to view] or similar to detect if the website? is a classified site or message board.? ? It will use the characters a-z 0-9 and - for trying to find domains.? It will start with [url removed, login to view] [url removed, login to view] [url removed, login to view], [url removed, login to view], [url removed, login to view] ... [url removed, login to view] [url removed, login to view] etc. ? It will attempt to identify the php, asp or cgi software and version? the classified or message board site is running if possible.? It will create a CVS file with the output.? It will also append to a log file the current time,? current action it is performing, which domain was the last or current one, any errors, and anything else important.? If it can automaticly resume later that would be great.?
field #1 - type of site - classified or forum
field #2 - URL of classified site
field #3 - title bar from header of page
field #4 - software type of classified or message board
field #5 - time and date searched
field #6 - alexis page ranking
the user input for the program will included:
start point - ie [url removed, login to view] [url removed, login to view] so i can return after it left off
type of site to search - classified, message board or both
timeout period (default 10 seconds)- how long to wait for a resposne
concurrent connections - how many queries at once
domain length - max number of characters long to search for
output file name -? (default current dir/$DATE$STARTTIME) output cvs file location and name
output log name - (default current dir/log) output log file, it will just append to it
top level domain - ie .com .net .us. .org etc
Most importantly the program will let me change with the timeout and concurrent connection count so i can set it at a rate which will almost max out the computer but not crash it.