Em Andamento

Aggregator for Linked and Embedded Videos

A script to crawl a domain and return the target urls of all linked or embedded videos sourced from other domains.

## Deliverables

This is something I've been doing manually myself for awhile with wget and grep, but I'm getting tired of doing it manually and I need a solid script to run. I would like to be able to execute the script from a linux environment (perl is probably perfect) and simply pass a domain to it upon execution. The results should be written to a text file and the text file should be named for the crawled domain with a timestamp of the completion time.

The results should be tab-delimited and include the following: crawled path, video url and domain of the video. Below is a snippet of code from a site containing embedded videos from various sources. The domain is [url removed, login to view] and the path to this page is [url removed, login to view]

--------------------------------------------------------------

<p><embed src="[url removed, login to view]" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="718" height="550"></embed></p>

<p align="center">Wait for like <font color="#FF0000">30 seconds</font> for the video to load below.</p>

<p><iframe style='overflow: hidden; border: 0; width: 718px; height: 432px' src='[url removed, login to view];height=480&#038;v=xxah9pl5lklff' scrolling='no'></iframe></p>

<p><embed src="[url removed, login to view]" type="application/x-shockwave-flash" allowfullscreen="true" width="718" height="438"></embed></p>

--------------------------------------------------------------

Running the script on the domain [url removed, login to view] should return results like these below. Note, the tab-delimited results in this example are partial results. Also, these were pulled from only a single page of the domain in question:

* * *

* * *--------------------------------------------------------------

[url removed, login to view]://[url removed, login to view] [url removed, login to view]

[url removed, login to view]://[url removed, login to view];height=480&#038;v=xxah9pl5lklff [url removed, login to view]

[url removed, login to view]://[url removed, login to view] [url removed, login to view]

--------------------------------------------------------------

Because of the wide variation in video source urls, I expect that each may need custom code or a custom regexp. That's how it's been for me so far. Sometimes the url ends in a unique video id, sometimes in .mp4 or .flv. Therefore, we can limit the number of sources to about twenty or thirty specific ones. A complete list will be provided upon acceptance of the bid, but will definitely include megavideo, megaupload, veoh, dailymotion, vimeo, videozer, facebook and myspace.

Domains that this script will be used on are unlimited. Some examples include [url removed, login to view], [url removed, login to view], [url removed, login to view], [url removed, login to view] and [url removed, login to view]

Habilidades: Perl, Instalação de Script, Shell Script

Ver mais: www linked in com, www linked in, www linked, vimeo tv, tv scrolling, shockwave com, scrolling tv, regexp linux, regexp examples, regexp example, linux regexp, linked it, linked com, ff0000, dailymotion com, linked, embedded+linux, embedded s, embedded linux, dailymotion, crawl a we, blade php, facebook unlimited script, php blade, flv mp4 embed

Acerca do Empregador:
( 0 comentários ) United States

ID do Projeto: #3452410

Premiar a:

efadrian

See private message.

$9.1 USD / hour
(0 Avaliações)
0.0