I need a web spider that does the following:
1- User enters a url
2- Spider goes to that url and follows every link on it (1 deep only).
3- Spider searches the subpage source for a word
4- For each occurence of the word, the spider extracts the word + 5 preceding characters + 5 trailing characters. Ignore white space.
5- The extracted text is written to file, 1 entry per line.
6- Spider goes to and does same thing for each subpage.
7- Spider finds a link on the bottom of original url with text "Click for more" and follows it.
8- Process loops back to step 2
9- Process ends when the "Click for more" link is not found.
*A simple configuration file called "[url removed, login to view]" should store the following variables.
- The word to search for on subpages (step 2)
- The # of preceding characters to extract (step 4)
- The # of trailing characters to extract (step 4)
- The text of the link to be followed in Step 7 and 9.
This is for personal use - no need for fancy options. Configuration file can be manually opened and altered.
Please let me know your expected time to completion. I would prefer someone who is ready to work on this as soon as possible.
1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done. 2) Installation package that will install the software (in ready-to-run condition) on the platform(s) specified in this bid request. 3) Complete ownership and distribution copyrights to all work purchased.