Website crawler and file rename program 2
I need software with the following functionality:
1. I will to start at a specific URL, such as:
[login to view URL]
2. This URL will lead to a new page with links.
3. Each one of these links will lead to a new page with links that have a .TPL extension, and have the word “ecfrbrowse” in the link URL. Only follow these links.
These links may either lead to another page with similar links, or lead to a page with links that look like this (or both):
<a href="/cgi/t/text/text-idx?c=ecfr&sid=013d13d2abbddecf9041b42964e8307a&rgn=div8&view=text&node=20:1.0.1.1.1.0.1.1&idno=20">
§1.1</a>
When you find links like the one above, you will create a new link that looks like this:
[login to view URL];c=ecfr;xc=1;rgn=div8;view=text;node=20:1.0.1.1.1.0.1.1
This generates a new page. That newly generated page needs to be saved, with a user selectable extension.
The page that is created from this link needs to be saved as a file.
4. The file name that is created is made of parts of the link, and in the above example, looks like this: [login to view URL]
5. The file name is made of the “idno=xx;, user selectable text, and the number after the link, (and after the § if this exists).
6. The following are user entered or controlled:
1. starting url
2. text string in the link, such as div6head, div7head, or div8head, where you only follow links with the text string
3. the extension
4. the text in the between the two found values in the link, such as “cfr” in the above example.
5. the order of the file name creation (in the above example, it could be [login to view URL], or [login to view URL], or 31.1cfr38.htm.
or
the user will have the option of saving all of the links on the page with first xx (a number) character of the starting url, incrementing the number and then choosing the file extension.
For example:
if the file came from [login to view URL], and the user selected "6" for the number and 'htm' for the file extension, the first link would create a file name of "[login to view URL]" in this example.
Use the [login to view URL] as your guide. We need all of the same input fields and actions.
Post any questions or clarifications you may have.
Hello
I am an experienced developer with 7 years creating applications that access the web and collect information. The type of crawler you are asking for sounds fairly simple and complex and I believe I have the skills to complete your task in a short time.
I look forward to hearing from you.
Sincerely
Mark