Write a Visual Web Scrapper, in MVC .Net, based on Portia (Python - github)

Write a .net port, in C# and mvc, of the Portia project, that can be found on [url removed, login to view] .

Video demo here:[url removed, login to view]

WARNING: This project was already cancelled once! Only apply if you have .net and python professional skills! I am a pro coder, and i know how to identify bad coding. I need a good coder, not a begginer. Project only finish when all features and expectations described above are completed.


- I have a web crawler that is used to scrape information from some websites, and i compile the information to a site I have. I work on this project on my own on weekends, and I am having a big problem that, very often, the websites that I scrape change the layout, or have variable layouts, like templates in Portia understanding. I need a easy way of updating this templates for a website, i a way that i can save as a project, and call this project on my crawler to extract the data of given urls, or html string texts. Given this idea, I think a Portia-like implementation, would be great. And, as I already have a crawler, I would not need the advanced crawler based on 'Scrapy'. I just need a way to create the scraper project for a site while navigate in the site inside the webapplication, and a way to extract the data later in my crawler, by calling a class library, passing project name and url (or html string), and get the existing propertys of the extraction.

Key features:

- Fully written in C#;

- Cannot use database. Projects must be saved as file.

- 2 Projects:

- One WebApplication: for visually build scraping templates for a given site project. Generally, for each site project, I will have a template for listing page and one for details page, but both can have template variations, depending type of product, category, or cases where DOM/Xpaths changes when product has less or more information.

- One class library: that given a url or given a html string, and a project (created on the web application) name, i can extract all data based on the extraction rules created on the web project. I want to call this class library from my existing crawler project.


- Web Application Expectations:

-> Same features as Portia web application, considering:

-> Easy configuration for projects directory path;

-> List Projects feature;

-> Possibility of 'Create new Project', 'Load Project', 'Save Project', 'Close Project(to select other project, initial app state)';

-> Visually navigate the site and select the information to be crawled, with possibility to customize rules, like Portia;

-> Possibility to have a list of starter urls. In this list i will keep the most common result pages that will list me produtcs. So can be easy to navigate directly for the listing page (to build that template) and after to the detail page (and build another template).

-> Possibility to create templates exactly as Portia.

- Class Library Expectations:

-> Public, easy to be used from other assemblies;

-> Easy to use and fast;

-> Can use HtmlAgilityPack and CsQuery;

-> Possibility to pass/set a HttpWebRequest (in case im using a proxy or different agent headers);

-> Possible input parameters for extraction method:

-> string 'project name' and Uri'url to extract';

-> string 'project name' and string 'html';

-> string 'project name', string 'template name', and Uri'url to extract';

-> string 'project name', string 'template name', and string 'html';

-> Output:

-> List of propertys, ready to be indexed by my crawler as soon it call/executes the extraction. I'm thinking in some type of dictionary, that i can search if the key exists, so I can pick the value and store on my data storage.

-> Would be great if the result was a context, with a field with the extractions dictionary, and other fields that list errors occured during parse, template used, and other relevant informations.

Habilidades: .NET, MVC, Python, Captura de dados na web

Veja mais: where can i get python, website scraping projects, web scraping https, web scraping application, web scraping advanced, web proxy directory, web page errors, web layout in html, web crawler features, visual web, templates web site mvc, templates for library, string search in c, string library in c, search string in c, scraping websites with python, scraping the web, ready templates for websites, python get type, public storage com, proxy directory list, pro web templates, professional video introduction, new web demo, new templates for web pages

Acerca do Empregador:
( 1 comentário ) Buenos Aires, Brazil

ID do Projeto: #6809505

7 freelancers are bidding on average $304 for this job


Hello Sir, We've done a number of web scraping projects for our clients. We have scraped many directory websites including yellowpages, yelp and e-commerce websites including amazon, walmart etc and many more. We can d Mais

$210 USD in 5 dias
(72 Comentários)

Hello Sir, YSoft Solution, a leading software development company based in India. We have expertise in .NET, MVC, Python, Web Scraping and we have developed many web applications using Our team of expert can Mais

$277 USD in 7 dias
(21 Comentários)

I'm specialized in webscraping in C# and python. So this project is a perfect fit for me. But your budget is too low. If you are willing to go a little higher like I propose say something and we'll discuss this better Mais

$400 USD in 14 dias
(5 Comentários)

Hi, Please feel free to discuss the project with me ............................................................................ Thanks, Murtaza

$350 USD in 5 dias
(7 Comentários)

I am Susan Zachariah having 5 years experience in the area of design and development me and my sister started this company 2 years back i wish to associate with you hope i can deliver your products without any error w Mais

$333 USD in 6 dias
(1 Comentário)

Hello, We have Experienced Professionals with 12 years of Web Development & Designing, Application Development, Desktop Application, ERP Software, Android App. Development. I-phone Development expertise. We have worked Mais

$222 USD in 12 dias
(0 Comentários)

experience in both c# mvc & python

$333 USD in 20 dias
(0 Comentários)