Duplicate/similar HTML file detector program

I need a software that, given a group of HTML files, detects the equal and very similar (configurable similarity percentage limit) ones.

Detail: the comparison engine must be able to operate on only a part of the HTML files specified via a regular expression, and only on the visible HTML text (thus ignoring HTML tags).

You can integrate/make use of programs already on commerce but in this case you have to communicate to me its/their price/s in PMB before we start.

I pay in advance via escrow.

Habilidades: Programação C, Administrador do Sistema

Veja mais: via escrow, regular expression no, regular expression in c, regular expression c, regular expression a, regular expression 0, price file, escrow price, commerce escrow, case escrow, can you pay escrow in advance, pay via escrow, escrow percentage, similarity, similar, html, HTML%, html tags, html only, html c, html /, detector, duplicate file, advance program, commerce case

Acerca do Empregador:
( 69 comentários ) Milano, Italy

ID do Projeto: #120200

4 freelancers estão ofertando em média $75 para esse trabalho


An interesting project. Might be easier to write in Perl because of the heavy reg ex support and HTML parsing ...

$50 USD in 5 dias
(1 Comentário)

Please see the PMB

$100 USD in 7 dias
(0 Comentários)

Hi, I just considered the technical design of your project and it interests me. With over 10 years of C programming experience, I can fully guaranty your satisfaction. Please check your PMs.

$50 USD in 6 dias
(0 Comentários)

I had develope simillary type of program in which i give two folder as input then application will automatic detect similar file from both, extension wise, size wise and date wise. i have to extend my project to compl Mais

$100 USD in 11 dias
(0 Comentários)