Hello,
I would like someone to develop a script/tool to scrape content from a particular website. This website has approximately 17,000 pages of content that I would like scraped. Due to the amount of content on each page (est. 10,000 words), I have not been able to use Visual Web Ripper without it timing out and missing content.
Once all existing content is scraped, I would like an automated script to check the website for updates and scrape any new content - this would ideally run daily.
The content I want can be found on the following website: [login to view URL]
Please use the date search function to search as far back as you can (1/1/2003 I think). Then it should display a list of 17,000+ transcripts.
An opened transcript should look like this: [login to view URL]
The opened transcript doesn't show everything I want. Only the 'Print' view of the transcript (a link near the top) displays everything I need. An example is here: [login to view URL]
I would like the output of this data to be stored in a database format. I would also like the page title, date and page URL as fields in the output.
I need this done ASAP - it should not take long for an experienced scraper.
Please let me know if you have any questions about the project.
Thanks.