I need an experienced programmer(s) for developing an OCR module for processing a document (tiff) based on defined template and delivering the data as xml. OCR engine from Scansoft (or if you are expert on ABBYY Finereader we can use that) will be used. The main area of use will be reading scanned invoices.
Functionality and phases will be discussed later but some keywords:
- defining templates
- identifing template to use at runtime
- handling different field types and formats (date, money, telephone etc)
- look up data (data from a field must be validated against database)
- pre adjustion of the image before processing (rotate, convert to grayscale etc.)
- handle/deliver 2nd, 3rd rate matches
Phase 2 will be extending with dictionary functionality (no template defined, just keywords like amount, suppliert, delivery date etc.)
Required programming language is C#, and unit tests is very big advantage.
1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.
2) Installation package that will install the software (in ready-to-run condition) on the platform(s) specified in this bid request.
3) Exclusive and complete copyrights to all work purchased. (No GPL, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site).
The module will run in Windows server enviroment (all versions supported by microsoft). The design module for definening templates will run on Win 2000 or Win XP.