
Fechado
Publicado
Pago na entrega
I am seeking a freelancer to build a full automatic data extraction and enrichment pipeline for Spanish procurement PDF documents. Scope of Work: Phase 1 – PDF Extraction Extract product names and key features from Spanish PDF files. Clean and structure the data. Output results in Excel (.xlsx) format. Phase 2 – Web Search & Price Extraction Use each product name as a Google search query, restricted to a specific domain. Analyze the top 5 search results per product. From each URL, extract with high precision: Price (must extract 5 accurate prices per product) Brand Product features and description Identify similar products, not only exact matches. Measure similarity using methods such as Levenshtein distance or cosine similarity. Deliverables Final datasets in Excel (.xlsx) and JSON (.json) formats. (Do not forget price extraction must be very precise and sufficient. We need 5 successful URL scraping) A detailed [login to view URL] explaining the full workflow, tools, and how to reproduce the process. Technical Notes Use of LLMs for extraction, similarity analysis, and enrichment is highly recommended. The solution must be accurate, efficient, and fully reproducible.
ID do Projeto: 40177294
42 propostas
Projeto remoto
Ativo há 5 dias
Defina seu orçamento e seu prazo
Seja pago pelo seu trabalho
Descreva sua proposta
É grátis para se inscrever e fazer ofertas em trabalhos
42 freelancers estão ofertando em média $54 USD for esse trabalho

Hi client, I’ve carefully reviewed your job description and have strong experience in these Data Analysis, API Integration, JSON, Web Scraping, Data Processing, Google Search, Python, Data Extraction, Natural Language Processing and Excel. I can build a reliable web scraping solution tailored specifically to your needs. Whether using Node.js with Puppeteer/Cheerio or Python with Selenium/BeautifulSoup, I will extract, clean, and organize your data efficiently. I also handle anti-bot protections, pagination, and full automation as required. As you can see from my profile, my web scraping reviews are excellent, reflecting my commitment to quality work. I focus on writing clean, maintainable, and scalable code because I know the difference between 99% and 100%. If you hire me, I’ll do my best until you’re completely satisfied with the result. Let’s discuss your target website and preferred data format. Thanks, Denis
$65 USD em 1 dia
5,5
5,5

Hello, I can build a fully automated and reproducible pipeline to extract and clean product data from Spanish procurement PDFs, enrich it with precise price and product information via domain-restricted Google searches, and deliver accurate results in Excel and JSON formats. I have experience with PDF extraction, data enrichment, similarity analysis, web scraping, and using LLMs to ensure high precision, including successfully extracting five accurate prices per product. Feel free to message me to discuss the workflow and technical details. Kind Regards, Ahsan Habib
$60 USD em 2 dias
4,9
4,9

Hi there, I can build a fully automated data extraction and enrichment pipeline for your Spanish procurement PDFs. In Phase 1, I’ll extract product names and key features from PDFs, clean and structure the data, and output reliable Excel files. In Phase 2, I’ll perform domain-specific web searches for each product, analyzing the top 5 results per query. From each URL, I’ll extract 5 accurate prices, brand, product descriptions, and key features. Similar products will be identified using string similarity measures like Levenshtein distance or cosine similarity, ensuring comprehensive enrichment. LLMs can be leveraged for extraction, semantic matching, and feature enrichment to maximize accuracy and efficiency. You’ll receive final datasets in Excel (.xlsx) and JSON (.json) formats, along with a detailed README documenting the workflow, tools, and reproduction steps. I’ll ensure the solution is precise, fully reproducible, and optimized for scale. Regards, Ahmad If you want, I can
$50 USD em 2 dias
4,5
4,5

Hello! I will build a fully automated pipeline to extract data from Spanish procurement PDFs, clean and structure it, and enrich each product via web search. The system will extract 5 precise prices, brand, features, and descriptions per product using similarity analysis (Levenshtein/cosine). Deliverables include Excel and JSON datasets plus a detailed README explaining workflow and reproducibility. Best Regards!
$80 USD em 5 dias
4,0
4,0

Hey , I just went through your job description and noticed you need someone skilled in JSON, Google Search, Data Analysis, Data Extraction, Data Processing, API Integration, Excel, Web Scraping, Python and Natural Language Processing. That’s right up my alley. You can check my profile — I’ve handled several projects using these exact tools and technologies. Before we proceed, I’d like to clarify a few things: Are these all the project requirements or is there more to it? Do you already have any work done, or will this start from scratch? What’s your preferred deadline for completion? Why Work With Me? Over 180 successful projects completed. Long-term track record of happy clients and repeat work. I prioritize quality, deadlines, and clear communication. Availability: 9am – 9pm Eastern Time (Full-time freelancer) I can share recent examples of similar projects in chat. Let’s connect and discuss your vision in detail. Kind Regards, Zain Arshad
$10 USD em 4 dias
3,8
3,8

As a seasoned developer with over 8 years of experience, I'm confident that I possess the skillset and technical know-how necessary to handle your Spanish Procurement PDF Data Extraction project impeccably well. Familiarity with Python and other languages allows me to engage in complex web scraping tasks efficiently, enabling me to accurately extract product names, features, brands, prices, and more from any given source. My extensive comprehension of API integrations further strengthens my capacity to perform a thorough analysis on the top 5 search results per product and exploit their content for precise data extraction. Additionally, my proficiency in Excel guarantees that I not only tidy up and structurate all extracted data from the PDFs but also compile it into xlsx and JSON formats just as you require. Throughout this two-phased task, I'll ensure that your final datasets are of the highest quality by implementing methods such as Levenshtein distance or cosine similarity to identify similar products with remarkable precision.
$55 USD em 7 dias
3,8
3,8

Hello mojganmadah, I would love to help. I have 8 years of experience in Python, Data Processing, Excel, Web Scraping, Data Analysis. I have completed similar projects. Visit my profile to check samples of my latest work and client reviews. If you like my approach, please connect in chat. Thank you, Bwalya
$55 USD em 7 dias
3,4
3,4

Hello there,, I have advanced experience in Data Mining, Statistics, Statistical Analysis and Data Science. With my vast background in data analysis and management, I am confident in my ability to handle your categorical data project effectively and efficiently. I have extensive experience in collecting, cleaning, analyzing, and visualizing data using Python programming, an invaluable asset for a project of this nature. Additionally, I am well-versed with CRISP-DM framework and adept at identifying patterns within datasets Choosing me means benefitting from not only my expertise but also my personal approach to projects. I understand that each task is unique, requiring tailored skills, and so I'm willing to go the extra mile to provide you with results that meet and exceed your expectations. Let's join forces in this project as our combined strengths will surely produce a result that's efficient, elegant and insightful! Let's not waste any more time! Together, we can mine this data efficiently and answer the questions to achieve your goals. Best Regards, Thanks
$10 USD em 1 dia
3,4
3,4

Hello there, As an experienced researcher and data scientist, data analyst, my qualitative analysis skills perfectly align with your job requirements. My profound knowledge of Python and R Studio guarantees fast learning and adaptation to new tools. Moreover, my advanced skills in Excel make me highly competent in handling large datasets efficiently—making me proficient in extracting the best insights from your transcripts. I fully comprehend the importance of working papers and meticulously preparing financial statements, especially within strict timelines. my sharp analytical skills and extensive knowledge of excel ensure that I leave no stone unturned in making sure every detail is covered under evaluation. My passion for quality, originality and meeting deadlines makes me an excellent choice for this project. I cannot wait to prove my extensive skills to you through providing actionable insights that will help guide your decision making regarding domestic charter flights. Best Regards
$10 USD em 1 dia
3,5
3,5

Hello, I have over 4 years of experience in reporting, analytics, and data processing, with strong expertise in building automated data extraction and enrichment pipelines. I specialize in extracting structured data from complex PDF documents (including Spanish-language procurement files), cleaning and standardizing outputs, and enriching them via web search and domain-restricted scraping. I can develop a fully reproducible solution to extract product names and features from PDFs, enrich them with brand, descriptions, and high-precision price data (5 validated URLs per product), and identify similar products using similarity techniques such as Levenshtein distance and cosine similarity, supported by LLMs where appropriate. Certified in: Excel Basics for Data Analysis – IBM Data Visualization and Dashboards with Excel and Cognos – IBM Data Visualization and Dashboards with Power BI – IBM Business Intelligence Essentials – Coursera Python for Data Science, AI & Development – IBM Python Project for Data Science – IBM Databases and SQL for Data Science with Python – IBM Introduction to Data Analytics – IBM Python for Everybody Specialization – University of Michigan Build a Full Website Using WordPress – Coursera Project Network Introduction to HTML, CSS, & JavaScript – IBM Best regards, Talha Almas Reporting & Data Analytics EXCEL | VBA | POWER BI | SQL | OCR | Google Sheets | Python
$100 USD em 4 dias
3,6
3,6

I’m a Python data engineer and automation developer with experience building end‑to‑end extraction and enrichment pipelines involving PDFs, web data, and NLP/LLM‑assisted analysis. I’ve delivered projects that combine document parsing, structured data cleaning, similarity matching, and price intelligence, producing outputs ready for direct business use in Excel and JSON. For this project, I will build a fully automated, reproducible pipeline that extracts product names and features from Spanish procurement PDFs, structures the data, and enriches it through domain‑restricted Google searches. Each product will be matched to similar items using string and semantic similarity methods (Levenshtein distance, cosine similarity), with five accurate prices per product extracted along with brand and descriptions. Final delivery will include clean .xlsx and .json datasets and a detailed README explaining the workflow and how to rerun it.
$80 USD em 6 dias
2,0
2,0

Hi there, Ready right now. I’m ready to build an automated pipeline to extract Spanish PDFs, enrich products with precise prices and details, and deliver clean Excel and JSON files exactly as requested.
$100 USD em 3 dias
1,9
1,9

I appreciate the opportunity to work on your automated data extraction and enrichment pipeline for Spanish procurement PDFs. Your need for precise extraction of prices, product details, and similarity analysis using methods like Levenshtein distance shows a clear emphasis on accuracy and an integrated, well-structured solution. Delivering clean, user-friendly datasets in both Excel and JSON formats with a comprehensive README aligns perfectly with my commitment to seamless project execution. I may be new to Freelancer, but I bring solid experience to the table. I’d be happy to offer a free call to go over the project if you would like. Regards, Blaze Nicholas
$45 USD em 14 dias
1,2
1,2

Hi Invatu Technologies FZLLC, I have carefully reviewed your project description and am excited about the opportunity to work on building a full automated data extraction and enrichment pipeline for Spanish procurement PDF documents. My expertise in data extraction, web scraping, and machine learning aligns well with your project scope. Before we proceed, I would appreciate the opportunity to ask a couple of questions: 1) Are there any specific libraries or tools you prefer for the PDF extraction and web scraping processes? 2) Do you have a specific domain in mind for the web searches, or shall I select a commonly used one? 3) What is your expected timeline for Phase 1 and Phase 2 deliverables? Why Choose Me? - Over 250 large projects completed successfully. - No negative feedback in the past 5+ years. - 5-star ratings on the latest 100+ projects. Availability: 9 AM - 9 PM Eastern Time (Full-time freelancer) I'm eager to discuss further how I can bring your vision to life and will gladly provide examples of my previous work related to data extraction and processing upon request. Best regards, Syeda Yusra Zubair
$38,50 USD em 7 dias
0,0
0,0

I specialize in building automated data extraction and enrichment pipelines, and I’m excited to help you streamline procurement data from Spanish PDFs. I’ll provide precise extraction of product details, enrich them with accurate price data, and ensure high-quality matching using advanced methods. Deliverables will include detailed datasets in Excel and JSON formats, with full documentation for reproducibility. Let’s work together to make this process seamless and efficient.
$50 USD em 7 dias
0,0
0,0

Hello! I am very interested in this Entry-Level Virtual Assistant position. I am detail-oriented, reliable, and able to follow instructions carefully. I have experience with data entry, Excel, and basic administrative tasks. I am ready to learn, manage my time well, and deliver accurate work on schedule. I am available to start immediately and look forward to working with you.
$55 USD em 7 dias
0,0
0,0

Hello, I can build a fully automated, reproducible pipeline to extract, enrich, and structure data from Spanish procurement PDFs. This includes precise PDF extraction, data cleaning, domain-restricted web search, high-accuracy price scraping (5 URLs per product), similarity matching, and enrichment using LLM-assisted methods. I’ll deliver clean Excel and JSON outputs along with a clear README documenting the full workflow and reproducibility. Ready to start immediately.
$55 USD em 2 dias
0,0
0,0

I can build a fully automated, reproducible pipeline for Spanish procurement PDFs. Phase 1: LLM-assisted PDF parsing, Spanish NLP cleaning, structured Excel output. Phase 2: domain-restricted Google search, top-5 URL scraping, precise price extraction, brand/features capture, and similarity scoring via cosine/Levenshtein. Deliverables include XLSX, JSON, and a clear README with setup, reproducibility, and accuracy safeguards for auditing and future scaling. Best regards! Malaika Asad
$10 USD em 1 dia
0,0
0,0

Hello, This project is a great fit for my experience building fully automated, LLM-powered data extraction and enrichment pipelines. Understanding & Approach You need an end-to-end solution that: Extracts product names and features from Spanish procurement PDFs Cleans and structures the data into Excel Enriches each product via domain-restricted Google searches Scrapes 5 accurate prices per product from the top search results Extracts brand, features, and descriptions Identifies similar products, not just exact matches, using embeddings and string-distance methods Delivers results in Excel and JSON, with a clear README for reproducibility I would implement: LLM-assisted PDF extraction optimized for Spanish documents High-precision web scraping with validation to guarantee reliable pricing Semantic similarity matching (cosine similarity + Levenshtein) A modular, reproducible Python pipeline designed for accuracy and scalability Tech Stack Python, Pandas, NLP & LLMs, web scraping tools, similarity models, Excel/JSON outputs. I focus on clean, reliable automation and can deliver a production-ready pipeline that meets your accuracy and reproducibility requirements.
$100 USD em 7 dias
0,0
0,0

As a detail-oriented professional with strong skills in data analysis and processing, I am poised to provide the accurate, efficient, and fully reproducible solution you need for your Spanish procurement PDF data extraction project. My name is Sahar, and I have built a reputation for myself in the industry through my exceptional performance in manual data entry, a task that demands precision and unwavering attention to detail - both of which are imperative for the successful completion of your project. Moreover, being bilingual in Arabic and English has given me a knack for language analysis which would be invaluable in understanding and extracting key features from your Spanish PDF files. Leaning on my previous experience in data processing, I assure you that I can expertly handle the task of cleaning and structuring the extracted data as well as converting it into Excel and JSON formats as per your requirement. Lastly, I am also well-versed in utilizing LLMs for tasks like similarity analysis and enrichment which are key components of phase 2 of your project. By meticulously employing these methods such as Levenshtein distance or cosine similarity, I will ensure we maximize successful URL scrapping. Allow me the opportunity to bring my skills to bear on this project and rest assured of topnotch service that will meet all your objectives. Together, we can exceed your expectations!
$55 USD em 7 dias
0,0
0,0

Lausanne, Switzerland
Método de pagamento verificado
Membro desde dez. 9, 2025
$10-100 USD
$10-100 USD
$10-100 USD
$10-100 USD
₹750-1250 INR / hora
₹750-1250 INR / hora
₹1500-12500 INR
$30-250 USD
$3000-5000 USD
₹750-1250 INR / hora
$30-250 USD
$15-25 USD / hora
₹750-1250 INR / hora
€250-750 EUR
₹1500-12500 INR
₹600-1500 INR
₹12500-37500 INR
₹750-1250 INR / hora
₹1500-12500 INR
$10-30 USD
$8-15 AUD / hora
$750-1500 USD
$250-750 SGD
₹12500-37500 INR