Em Andamento

C# crawler (NCrawler) - Need a specific configuration and extension

Hi,

We need to crawl our company's intranet website and extract all links with their link-name and url. We want to use <[url removed, login to view]> (LGPL license). We will process only HTML and everything is in UTF-8 charset.

Most features requests below are already integrated in libraries within NCrawler's latest source files.

**Th****e scope of the current project**:

1. a. Write a windows dialog on top of NCrawlers console, where indexing the links of a given URL can be started, stopped and resumed.

b. Stopping should also occur if for example internet connection breaks down or the program is closed.

c. Where the retry count of failed URLs can be specified, as well as link depth.

2, Where the found URLs and the link name are saved into a SQL Express database and the currently processed URL logged onto either the Console or a text-box (choice of the programmer).

**Target system**:

Our system has .NET 4 and Microsoft SQLExpress.

**Deliverables**: We need a working sample with clean code including all source files in C#, that is able to index [][1]<[url removed, login to view]> with a link-depth of 3 and that can resume again, when we disconnect internet connection and reconnect. All data should be stored in Ms SQLExpress. (Watch out for UTF-8).

----------------------------**

Information for the programmer to make your work easier:**

For stopping / resuming: Have a look at [url removed, login to view](false or true);

Regarding link-name extraction:[url removed, login to view] doc = new [url removed, login to view]();

[url removed, login to view]([url removed, login to view]);

You can use RegEx.

Have a good day and all the best,

Sina

Habilidades: Amazon Web Services, Programação C#, Engenharia, Microsoft, Gestão de projetos, Arquitetura de software, Teste de Software, Área de trabalho do Windows

Ver mais: ncrawler, ncrawler html, ncrawler sample, ncrawler extension, ncrawler example, use ncrawler, sina, regex is, regex example, regex c, project management resume sample, project resume, look write, intranet ms, good sample resume, good resume example, good example resume, found programmer, example good resume, best website programmer company, good resume, htmlagilitypack, website crawler, th write, need company name

Acerca do Empregador:
( 1 comentário ) Austria

ID do Projeto: #3049033

Premiar a:

djellison2000

See private message.

$17 USD em 10 dias
(40 Avaliações)
5.1