there are 5 fixed formats of a letter( say template1 to template5). all of the have about 5-6 fields like sender name, receiver name , address , date , tele no etc. Depending upon the type of template, the positioning of the fields differ. requirement is to extract these fields and put them in a database in order to make a searchable database of such scanned letters.
1> input is a scanned image of a letter(may more than a single page) but definitely belonging to 1 out of the 5 layouts.
2>problem is disintegrated into identifying the template first, using neural/neuro-fuzzy network and then applying template specific rules to extract information(commercially available ocr may be used)
3> end result should allow user to enter a query based on date of letter or name or other fields and should pop out the related letters in a ranked manner.