I need a C++ class that will extract information from a web page. The inputs will be (1) html code and (2) a list of commands defining the text to be extracted. The output will be the extracted text in a string. In addition to the class, you will need to create a small wrapper program to test/demonstrate the class works. It will just read in the two input files and create one output file with the extracted text. I have created documentation for the commands your program will need to accept. The attached document fully defines these commands. They do things like: Find next occurrence of some string, either in the text, the html code, or either Find previous occurrence of some string Move forward or back in the html Save some text found at a particular place in the html Parse a table found in the html code Store the next HREF link To work correctly, the program will need to parse the html. The coder can use an existing html parser in the program as long as it is available under an open source license. If you intend to do so, please identify the parser you plan to use and what license it uses. UPDATE: I really want proposals to use an existing, open-source, C/C++ html parser.
1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.
2) Deliverables must be in ready-to-run condition, as follows (depending on the nature of the deliverables):
a) All files need to load, compile and run the software using Microsoft Visual C++ 6.0.
b) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased, including any 3rd party components. (Any GPL, GNU, 3rd party components, etc. must be listed AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).
Code to compile in Microsoft Visual C++ 6.0 Program to run under Windows 2000, XP and Vista