Cancelado

Generate list of inlinks pointing to my competitors sorted by Pagerank in decreasing order

The high level overview is that I would like to have a list of web sites that are pointing to my competitors, sorted by decreasing Pagerank.

More specifically, I would like to have a script developed that takes as input a set of keywords, then sends these keywords to both Google and Yahoo, and retrieves for each query the top 10 URLs.

We then want to retrieve all inlinks to these URLs, sort them, and obtain pagerank.

The final delivery is a Excel file with these links, as well as the script itself.

See complete details below.

My own site is:

[url removed, login to view]

## Deliverables

The high level overview is that I would like to have a list of web sites that are pointing to my competitors, sorted by decreasing Pagerank.

More specifically, I would like to have a script developed that takes as input a set of keywords, then sends these keywords to both Google and Yahoo, and retrieves for each query the top 10 URLs.

These are the query keywords (each line has one query):

vacation rentals germany

apartment rentals germany

germany vacation rentals

holiday rentals in germany

rental apartment germany

vacation rental germany

rent apartment in germany

germany lodging

germany rentals

holiday apartment germany

rental apartments germany

germany vacations

germany vacation

vacation in germany

Then the script needs to build the union of these candidate result URLs. We're only interested in the host part (the path can be discarded.)

Example:

- if a result URL is [url removed, login to view] the list entry should be '[url removed, login to view]'

Now that list needs to be uniquely sorted, and dups removed.

Then we go to Yahoo Site explorer and obtain for each of these URLs the list of inlinks pointing to them (basically a linkdomain: query, excluding inlinks from the same domain)

linkdomain:[url removed, login to view]

show inlinks except from this domain

Example:

[url removed, login to view];_ylt=ArrV.UvMJUWyhO6xfNs4P_Pbl8kF?p=http%3A%2F%2Fwww.vacationrentals.com&bwm=i&bwmo=d&bwmf=s

You can manually download a TSV list that comprises these URLs for each of the candidate URLs.? We basically need to download for each of these URLs one of these inlinks list.

Next step again is to uniquely sort each list, through away the file path of a URL, only keep the host (this reduces further the list).

Then we do the union of all these lists, and dedup again by sorting uniquely.

Important: we would like to count the number of times a URL is member in one those lists.

Example, an inlink URL A is member in 10 lists, so we keep a counter of 10 around for A.

Now we should have a final lists of URLs and a counter sorted by decreasing counter number:

[url removed, login to view] 20

[url removed, login to view] 20

[url removed, login to view] 19

...

I'm attaching a file that contains a list of URLs that are already linking to my site [url removed, login to view]

We can take the host-part of these and remove them from the candidate set.

At this point lets look at the size of the list. We can prune probably those that have a counter less than k, where k can be <3 ??

As a final step we would like to obtain the Pagerank for the remaining sites.

There are APIs around to obtain the pagerank.?

The final output is then a list with three columns:

URL counter PR

[url removed, login to view] 20

[url removed, login to view] 20

[url removed, login to view] 19

I suggest we sort first by PR column (in decreasing order) and then by counter in decreasing order.

The final delivery is an Excel file, as well as the script(s) itself (along with documentation) that I can re-use for future purposes.?

The script can be written in perl, python, php or other scripting language (don't have a strong preference, as long as it can run on a unix box)

Habilidades: Engenharia, Java, MySQL, Perl, PHP, Python, SEO, Arquitetura de software, Teste de Software

Ver mais: web rentals, union first, set union, rental line, python look file, line rentals, apartment list, python download file, perl scripting language, list apis, download file python, yachts, vacation rentals, python excel, pointing, google scripting, excel python, entry level html, search keywords list, sorted, list sorted, generate html, python html script, python domain, sorted list

Acerca do Empregador:
( 7 comentários ) United States

ID do Projeto: #3020974

4 freelancers estão ofertando em média $82 para este trabalho

PawMar

See private message.

$59.5 USD in 14 dias
(13 Comentários)
3.6
codetech

See private message.

$127.5 USD in 14 dias
(3 Comentários)
2.7
darshitvw

See private message.

$57.8 USD in 14 dias
(0 Comentários)
0.0
onuridr

See private message.

$85 USD in 14 dias
(0 Comentários)
0.0