Convert PDF to Sequential page JPGs, remove white margins, upload to AWS S3

Encerrado Postado há 6 anos Pago na entrega
Encerrado Pago na entrega

The ultimate goal of the project is to write a python script that:

a) converts PDFs to sequential JPG pages

b) trim white margins around jpgs and save to disk.

Second part of the project is to upload the jpg's to AWS s3 with public access permissions via python + AWS package.

Third part is garbage collecting from AWS s3.

Details:

Given a list of PDF URLs+ PDF_IDs , and Image quality/size (pixel length+width):

1) Download PDF from provided link - ex: https://s3.us-east-2.amazonaws.com/pdfs12z49a/sample+pdf/[login to view URL]

2) Convert each PDF into a series of JPGs (1 for each page), with specified image quality /size

3) Trim white margins from each JPG (margin spacing will vary, so you will need to calculate that for each page)

4) Create a folder on disk called PDF_ID, and save each image in a sub folder generated from image quality/size input (Ex: C:\PDF2JPG\PDF_ID\300dpi\[login to view URL] , [login to view URL] etc).

5) output a list of lists for each PDF, containing PDF_ID, quality, page_number, location_on_disk

ex:

[

['PDFID1' , 300, 1 , 'C:\Windows\ID\300dpi\[login to view URL]']

,[PDFID1 , 300, 2 , 'C:\Windows\ID\300dpi\[login to view URL]']

]

Part 2 - Upload to AWS using Python 2.7 /aws package, and the list of lists from above:

1) Generate a new bucket within existing bucket, named PDF_ID

2) Upload all images to AWS S3 bucket for PDF_ID with public read permissions

3) output a list of lists for each PDF, containing PDF_ID, quality, page_number, AWS url

ex:

[

['PDFID1' , 300, 1 , '[login to view URL]']

,[PDFID1 , 300, 2 , '[login to view URL]']

]

and output a list containing PDF_ID, page_number, url .

Part 3 - AWS garbage collector - Python + AWS package

Given a list of PDF_IDs, delete sub buckets with that ID.

Ideally, I'm looking for somebody who has done this type of project in the past, and has a script laying around. Once the bid is accepted, I will provide:

1) PDF id's + links

2) User id + PW to AWS with write permissions to test buckets

Thank you.

Amazon Web Services Linux Python Arquitetura de software Captura de dados na web

ID do Projeto: #16058400

Sobre o projeto

4 propostas Projeto remoto Ativo em há 6 anos

4 freelancers estão ofertando em média $219 nesse trabalho

MediatreeTn

A proposal has not yet been provided

$250 USD in 7 dias
(111 Comentários)
7.4
schoudhary1553

Greeting, I have understood your Convert PDF to Sequential page JPGs, remove white margins, upload to AWS S3 task and can do it with your 100% satisfaction. Please ping me for more discussion. I have more than 5 Mais

$200 USD in 3 dias
(44 Comentários)
6.2
joystick220

Hey there I think I've understood almost every aspect of the project. Although to be honest I don't have any scripts lying around, but I'm capable more then enough to deliver this project within the same day itself. Mais

$250 USD in 3 dias
(72 Comentários)
6.4
mmadi

Hey paraplan321, I have gone through your project Convert PDF to Sequential page JPGs, remove white margins, upload to AWS S3 Thanks for posting this job, which comes under our expertise area. We are happy to offer Mais

$175 USD in 4 dias
(12 Comentários)
6.0