1. The project should be developed in GO language
2. The application input is list of words and/or regular expressions + file to search in
3. File can be of type
3.1 *.txt, *.rtf, *.xml, *.html
3.2 Any MS office - word, excel, power point (note that word and power point can include text and images)
3.3 PDF
3.4 Any image (images can contains text, need to use OCR to convert the image to a text format)
4. The output of the execution is which of the words are found in the file or whether the rex expression is found in the file
5. an approach that can be used is to use a library to convert the input file into text and run the search on a simple text
example:
Input is two words and one regular expression (the reg exp describe purchase order number) separated by comma "hello, world, (\W|^)po[#\-]{0,1}\s{0,1}\d{2}[\s-]{0,1}\d{4}(\W|$)
second input is .doc file
if and of the words or purchase order are in the document, the output should be which word or reg exp is found
Hello,
I can develop a program which can parse txt rtf xml html odt, docx, pptx and can read image with ocr.
The program will be written in golang.
It will search multiple files too.
I'm waiting to hear from you.