Scrape website sitemaps and test all the internal links inside scraped pages for status 200 or status 302 response

Encerrado Postado Jul 9, 2015 Pago na entrega
Encerrado Pago na entrega

Hi,

We are developing a deployment strategy and part of it involves the testing of all the links inside our pre-production website for 404 errors or other types of stuff.

We would like run a to scraper which would helps us test the following sequence:

1. Given website test.mywebsite.com..

- Does it have sitemap?

- Does it have multiple sitemaps included?

- Open all sitemaps file and scrape each link

2. For all the links and images from the sitemaps

- Run an HTTP Test on all the links inside each page to test their return status.

E.g.

$ python [url removed, login to view] [url removed, login to view]

Starting run for: [url removed, login to view]

Sitemap: [url removed, login to view]

3 Sitemaps Found:

- [url removed, login to view]

- [url removed, login to view]

- [url removed, login to view]

Testing:

/ -> OK

/contact-us -> OK

/our-team -> OK

/logon -> OK

/newpage -> ERRORS 404

STATUS 404: IMG : [url removed, login to view]

STATUS 404: LINK: [url removed, login to view]

/otherpage -> OK

Can you please provide estimates?

Linux PHP Python Shell Script Desenvolvimento de Software

ID do Projeto: #8031449

Sobre o projeto

7 propostas Projeto remoto Ativo em Aug 15, 2015

7 freelancers estão ofertando em média $57 nesse trabalho

mantislin

Hi sir, I am scraping expert, I have did too many similar projects, please check my feedback then you will know. Can you tell me more details? then I will provide demo data for you. Thanks, Kimi

$120 USD in 3 dias
(148 Comentários)
6.8
ahmedbassiouny

I have a bachelor in Computer Science from the American University in Cairo and a minor in Mathematics, with 10+ years of experience with hands-on programming. I have worked for the past year in Microsoft's Advanced Te Mais

$111 USD in 3 dias
(10 Comentários)
4.1
prog2u

Dear Sir/ Madam, Kindly check my bid & project completion ratio befor awarding. I'm really interested to work on this project, I can start the work now , and can provide the best services from my end. Please come on Mais

$50 USD in 0 dias
(14 Comentários)
3.5
orioncx

ok. can implement this use lxml + requests lib. Also maybe multiple sitemap from file to check and log results to file will be more optimally. Contact me to start work.

$25 USD in 2 dias
(0 Comentários)
0.0
Onlance

Thu, 09 Jul 2015 16:38:10 +0000 Hello, Can do a quick Perl script. May need to install a few modules, though. Will follow all links starting from main page. Will detect copies of links and avoid third party link Mais

$25 USD em 1 dia
(0 Comentários)
0.0
frankmowen

We are a USA based firm that delivers high quality work the first time. There is no need to explain your requirements more than once.

$55 USD em 1 dia
(0 Comentários)
0.0