Em Andamento

Multi threaded page fetcher

Multi threaded page fetcher

The module creates N threads that in parallel fetch web pages from a constantly updating list of URLS.


add_url: adds a URL to a URL list.

thread_pool_count: sets/gets the number of threads.

thread_sleep_interval sets/gets the sleep interval as described below

run: start N threads to fetching the pages, it return immediately after the threads have started. A thread reads URL from the a list of URLs remove it of the list, fetch the page and adds the URL and fetched page to new list that contain URL, PAGE CONTENT pairs. If the list of URLS is empty the thread sleeps for M seconds, then tries again.

get_next_url: returns list containing one URL and its PAGE CONTENT. This function returns URL, PAGE CONTENT pair reference from the list that the threads fill, and then remove the reference from the list.

new: lets you create and instant with parameters.

stop: stops all the threads, the two lists remain as they where.

Example of how it will be used:

$fetch = Fetcher->new(thread_pool_count => 4, thread_sleep_interval => 2);


while (some_condition)



$fetch->add_url($URL) if ($URL);

$pair = $fetch-> get_next_url();

if ($pair)






# this line will work if the list contains URLS

$pair = $fetch-> get_next_url();

we should get all the source code documented and we get all copyrights and we can do whatever we want with the code including changing it and reselling it, eating it ..:-)

the module should be compatible with ms windows/linux and all the module dependencies as well, it must be based only on open source code no special modules that cost money or limit our ability to distribute the code are allowed !

we want simple code that is easy to maintain

If you need to use binary modules the binary modules source code should be included and it should be compatible with Linux & Windows

Habilidades: Perl

Sobre o Cliente:
( 3 comentários ) Modiin, Israel

ID do Projeto: #2619