Need to build a conversion tool which extract the data from PDF as it is and convert into a well formatted HTML files with a button click on a bulk data.
While conversion following things must be considered:
-100% automation required
-Emphasis to be maintained as it is.
-Indentation/ Block-quoting to be maintained in its hierarchy
-Multiple Entity Extraction and placement in right tags
-Standard HTML formatting based on Rules
-Removal of all junk characters and unwanted data
Note : Patterns in the files may differ from each other so program must be trained enough to identify the patterns and act accordingly.