Here is some nice artivle I found:
[login to view URL]
The idea is to filter from text corpora sentences that are irrelevan in some text subject domain. For example it is not necessary to have sport comments in medical texts.
I would need an implelentation of such tool. For LM training I would like to use [login to view URL] tool.
In their solution they filter only monoligual data. I would also like the program to filter bi-lingual. For example if I want to filter [login to view URL] file based on [login to view URL] I remove poor lines from somedata.en. But also if I want to filter [login to view URL] and [login to view URL] files based on [login to view URL] I remove poor lines from [login to view URL] and corresponding line in somedata.fr. We can assume that one line represents one sentence in such files.
Hi,
I'm expert and experienced in Natural Language Processing and Machine Learning preferably using Python. I've also participated in a National Level Machine Learning Programming Contest in India where I stood second ([login to view URL]).
Please contact me so that we can further discuss more details about this project. Looking forward to hearing from you.
Thanks.
$250 USD em 10 dias
5,0 (3 avaliações)
3,5
3,5
2 freelancers estão ofertando em média $225 USD for esse trabalho