I have a directory containing a downloaded version of an HTML website. I need a program to recursively go through this directory, find all HTML files, and convert them into PDF -- similar to something like HTMLDOC, but recursive. All text and images in the source HTML *must* be preserved. CSS features like fonts, etc. would be nice, but are not necessary.
The program will prompt the user for the source directory (which contains the website), and the output directory (where to put the PDFs). The output directory can be flat -- it does NOT need to match the source structure, it just needs to hold all of the PDFs.
There are thousands (somewhere around 50,000 total) of individual HTML files in the site, so speed of processing is important; please use the fastest conversion method you can find. You can develop in your choice of Java or C#.NET. I can run either on my local box to process the files.
Please let me know if you have any questions. Thank you!