Language Recognition

I am looking for an utility to guess language and encoding of plain-text documents.

Just like some browsers which have 'Auto-detect' function. I've heard about some N-GRAM based methods, but there may be others available.

This thing has to accept file or string as an argument and return Language and Encoding. If the document contains 2 or more languages it should return the most heavily used, like 'Mostly English' or 'Mostly Russian'.

It has to be able to 'learn' new language/encodings.

It must be written in Java, encapsulated as separate class, so it can be easily plugged into any Java program. Detailed JavaDoc is required.

## Deliverables

1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.

2) Exclusive and complete copyrights to all work purchased. (No GPL, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site).

## Platform


Habilidades: Engenharia, Java, MySQL, PHP, Arquitetura de software, Teste de Software, Hospedagem Web, Gestão de Site , Teste de Website

Veja mais: string source code java, recognition language, php language learn, learn java code, c language learn, russian written language, php program language, learn russian, java gram, java file utility, accept language, accept language php, php accept language, accept class argument java, utility function java, code recognition, learn russian language, file utility java, gram program java, code recognition php, java learn, php text recognition, code language, written documents text, java language

Acerca do Empregador:
( 7 comentários ) Bulgaria

ID do Projeto: #3012812

Concedido a:


See private message.

$170 USD em 30 dias
(2 Comentários)

2 freelancers are bidding on average $138 for this job


See private message.

$106.25 USD in 30 dias
(2 Comentários)