Encerrado

Java web scrapping improve

Request details

I developed a Java program to scrap information from a website. The architecture of the solution involves: 1) using Java Selenium to send requests to the webpage via Chrome Webdriver to trigger authentication and authenticated requests; 2) routing the requests from Chrome (headless) to Java BrowserMobProxy to capture three HTTP headers (Authorization, X-CSRF-TOKEN, and Cookie) and one query string (without these, the server after some requests starts responding 512); and 3) use these 4 elements in HTTPs requests from Java directly to the webpage (i.e. without Selenium, Chrome, and BrowserMobProxy involved) to retrieve the desired information.

This program does the basic functionality of extracting the information but has a few problems:

It depends on an external non-Java component: Chrome WebDriver

It depends on Java Selenium and Java BrowserMobProxy, two dependencies that I would like to remove

It is not optimized (too much refresh and too long sleep periods) relatively to the limit upon which the Webpage (Cloudfare) starts responding 429 errors. Thus, the retrieval of the information is taking much more time than needed.

Deliverables

You will get the current program Java code and you will need to solve the problems above. To do so, you will need to:

A. Find out how to authenticate and refresh the 3 headers and the query string without depending on Selenium, Chrome Webdriver, and BrowserMobProxy. As most of this data is likely generated in JavaScript, you will need knowledge about JavaScript and how to execute JavaScript from within Java or convert the JavaScript code to Java (preferable solution).

B. You will need to identify the limit upon which the Webpage (behind Cloudfare) starts responding 429 errors. You will need to tune the refresh frequency of the headers and sleep periods to the limit identified. You will need to demonstrate the benefits of your changes by extracting the information currently extracted by the program and measuring how long it takes.

Note: you will need to create your own login/password in the webpage. No additional requirements exist to register.

Habilidades: Java, Captura de dados na web, JavaScript, Arquitetura de software, Python

Veja mais: web scraping java source code, java web scraping tutorial, java web scraping handbook pdf, java web crawler jsoup, web scraping in java with jsoup, professional web scraping with java, extract specific data from website using java, java web scraping javascript, java web echo server, java web crawler image capture, non java web service development, java web phone, web application http request, using captcha java web services java, send friend request time, java web services student management system, web design quote request, java web project timesheet, real time java web based, java web start time sheet

Acerca do Empregador:
( 1 comentário ) Băilești, Romania

ID do Projeto: #26818026

9 freelancers estão ofertando em média $492 para esse trabalho

schoudhary1553

Hello, I can make improvement in java web scraping. I have gone through your job posting and become very much interested to work with you. I am an expert in this field. I have already completed several projects like Mais

$750 USD in 7 dias
(128 Comentários)
7.1
(21 Comentários)
6.1
sodiqa32

Hello, I am pleasure with your job as detailed. Thank you for the job posting. It’s a pleasure to meet you. I’d really like to work with you on this one if possible! I do have a couple of questions, but first I’d like Mais

$250 USD in 2 dias
(45 Comentários)
6.0
Demenntor

Dear Employer, I have read the project details and confident to work on improving java web scraping. I have extensive knowledge on java, javascript, python, software,etc . Kindly message me so that we can discuss mor Mais

$667 USD in 3 dias
(30 Comentários)
5.2
serzhkavalchuk96

Hi, With over 5 years of experience in Python. I’ve gone through your complete project description. I am interested in this project as it is exactly within the scope of my skill. My main skills are as follows: Python, Mais

$500 USD in 7 dias
(2 Comentários)
2.4
asmamessaadi

I can rewrite a clean Python Selenium automated driver code, optimized and organized without bugs, with pipeline to output json or CSV have a look into my Selenium bots in my portfolio [login to view URL] Mais

$300 USD in 2 dias
(2 Comentários)
2.2
serhiilyskin

Hi, sir. I have carefully checked your requirements and I was glad that I've already done this kind of projects before. I'd love to share more detail with you over chat and I'm sure that you'll be interested in them. I Mais

$555 USD in 6 dias
(1 Comentário)
2.0
ProflexDesign1

Hi, how are you doing? I hope you're doing well! I am a professional Web Scraper for the last 7 years. I am confident to complete your project. Regards! Sergei.

$450 USD in 4 dias
(0 Comentários)
0.0
rj116

Hi please hire me Relevant Skills and Experience Did the automation testing in selenium using java

$556 USD in 10 dias
(0 Comentários)
0.0