I have 12 million urls spread across 12 text files that I need analyzed to see if I am able to post a comment to that WordPress blog. I need a script that can load all 12 text files, then scan each and every page source code looking for a specific value to determine if the blog is able to be posted on.
## Deliverables
I have 12 text files with about 1 million urls listed in each, one url per line. Each URL is to a WordPress blog. I need a script which will scan all 12 million urls to detect if that blog can have a comment posted on. For example the script should load the source code to each URL and look for value="Submit Comment" - or - value="Add Reply" And if it finds either of the above it should consider that to be a good URL and will save that URL into a separate text file (max 1 million urls per text file) The script should be able to run with up to 300 threads, and I should be able to adjust the timeout settings for loading each page source code. The script should also retry any failed attempts to load a page. Once the script is finished it should report how many good and how many bad blogs were found, also should report how many failed (timeout, 404, 500 errors....etc) have been detected. The script should output the successful blogs, one url per line with a max of 1 million urls per text file. It should also create a text file of all blogs with read errors (timeouts, 404 error, 500 error...etc) so that I can go over those blogs again. I do NOT need a list of blogs that were scanned successfully but did not have the above values (meaning the blog is closed for commenting) Would like to either run on Windows 2008 Server or on my linux web hosting account using PHP. Your choice. Need this completed ASAP 1) All deliverables will be considered "work made for hire" under U.S. Copyright law. Employer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the employer on the site per the worker's Worker Legal Agreement). 2) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done. 3) Deliverables must be in ready-to-run condition, as follows (depending on the nature of the deliverables): a) For web sites or other server-side deliverables intended to only ever exist in one place in the Employer's environment--Deliverables must be installed by the Worker in ready-to-run condition in the Employer's environment. b) For all others including desktop software or software the employer intends to distribute: A software installation package that will install the software in ready-to-run condition on the platform(s) specified in this project.
* * *This broadcast message was sent to all bidders on Monday Dec 13, 2010 12:52:56 AM:
I have created a new project with new details, please take a look and place your bid ASAP if you wish to work with me. This is a URGENT project. Check my rating on RAC for references. [login to view URL]
## Platform
Windows 2008 Server or PHP