Find Jobs
Hire Freelancers

438545 Website Crawler and Software Identifier

N/A

Em Andamento
Publicado há mais de 13 anos

N/A

Pago na entrega
I need a solution that will crawl the web identifying sites that use the open source software Joomla, Wordpress, Drupal, Mambo, Alfresco, and Plone. I would want it to create line in a mysql database with: -the url of the site, -what software it's using, -what version of the software -the title tag -the description meta tag -the keywords meta tag We would want to point the program at a directory (like [login to view URL]) to start. Then, have it find more URLs as it goes. So, the solution needs to find the websites on its own just like how a search engine does it. NOTE: YOU must be the one to define how the tool knows the software and the version that the website is using for these 6 packages: Joomla, Wordpress, Drupal, Mambo, Alfresco, and Plone. The solution must run on Linux or Mac OS X and MySQL. NO WINDOWS. We'd prefer something written in PHP 5. The logic goes like this: 1. Go to list of sites to be crawled at future time and take the next one on the list. 2. Go to URL. 3. Does this site run one of the 6 open source software? If yes, go on to step 3. If not, go to step 5. 4. What version? 5. Write all info about website to database. 6. Are there any links from this website to other sites? If yes, go to step 6. If not, go to step 7. 7. Write links to list of sites to be crawled at a future time. 8. Go to Step 1. If the list is empty, go to [login to view URL] and crawl around until you find some more links. OR, offer me a better solution! I know that this is a solvable problem and that lots of people have created web crawlers before. I'm hoping that there is an out of the box open-source solution somewhere that we can just tweak so it wouldn't take very long to get this running. Bonus feature: How much extra would it cost to also grab the contact information if that's easily available on the website? (Domain name registration information is NOT acceptable. It has to be the published contact-us phone number from the website in question.)
ID do Projeto: 2184420

Sobre o projeto

Projeto remoto
Ativo há 12 anos

Quer ganhar algum dinheiro?

Benefícios de ofertar no Freelancer

Defina seu orçamento e seu prazo
Seja pago pelo seu trabalho
Descreva sua proposta
É grátis para se inscrever e fazer ofertas em trabalhos

Sobre o cliente

Bandeira do(a) UNITED STATES
Austin, United States
5,0
32
Membro desde jan. 20, 2006

Verificação do Cliente

Obrigado! Te enviamos um link por e-mail para que você possa reivindicar seu crédito gratuito.
Algo deu errado ao enviar seu e-mail. Por favor, tente novamente.
Usuários Registrados Total de Trabalhos Publicados
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Carregando pré-visualização
Permissão concedida para Geolocalização.
Sua sessão expirou e você foi desconectado. Por favor, faça login novamente.