This role requires a very good understanding of LDA (Latent Dirichlet Allocation) and how to train a system so that NER (named entity recognition) can be used.
The current system uses Java and Mahout and we will stock to that current system. The open source code is available via github (NLPWithMahout).
Therefore you will have strong skills in:
1. NER, LDA, TFIDF
You will understand word modelling and methods used in LM (language modelling).
You will also have good skills in Java, Linux, Eclipse.
The role requires you to assist in building training data for a system so that test data can have its topics identified.