
Fechado
Publicado
Pago na entrega
I am seeking a freelancer to build a full automatic data extraction and enrichment pipeline for Spanish procurement PDF documents. Scope of Work: Phase 1 – PDF Extraction Extract product names and key features from Spanish PDF files. Clean and structure the data. Output results in Excel (.xlsx) format. Phase 2 – Web Search & Price Extraction Use each product name as a Google search query, restricted to a specific domain. Analyze the top 5 search results per product. From each URL, extract with high precision: Price (must extract 5 accurate prices per product) Brand Product features and description Identify similar products, not only exact matches. Measure similarity using methods such as Levenshtein distance or cosine similarity. Deliverables Final datasets in Excel (.xlsx) and JSON (.json) formats. (Do not forget price extraction must be very precise and sufficient. We need 5 successful URL scraping) A detailed [login to view URL] explaining the full workflow, tools, and how to reproduce the process. Technical Notes Use of LLMs for extraction, similarity analysis, and enrichment is highly recommended. The solution must be accurate, efficient, and fully reproducible.
ID do Projeto: 40178387
43 propostas
Projeto remoto
Ativo há 3 dias
Defina seu orçamento e seu prazo
Seja pago pelo seu trabalho
Descreva sua proposta
É grátis para se inscrever e fazer ofertas em trabalhos
43 freelancers estão ofertando em média $64 USD for esse trabalho

⭐⭐⭐⭐⭐ Hello, I have carefully analyzed your project requirements and understand the need for a fully automated, reproducible pipeline to extract, enrich, and validate product data from Spanish procurement PDFs. I recently delivered a similar solution that parsed Spanish technical PDFs, structured product attributes, enriched them via domain-restricted search, and produced validated Excel and JSON outputs using LLM-assisted extraction and similarity scoring. Key aspects of your project include high-precision PDF extraction, structured data cleaning, domain-restricted Google search, accurate price scraping from five valid URLs per product, brand and feature extraction, and similarity detection using methods such as cosine similarity and Levenshtein distance. I will implement this in Python with robust scraping safeguards, LLM-assisted enrichment, reproducible workflows, and clear validation to ensure price accuracy and consistency. I am available to begin work immediately and am committed to delivering a reliable, efficient pipeline with clean Excel and JSON outputs and a detailed README within the shortest possible timeframe. Best regards, Mauricio
$55 USD em 7 dias
4,1
4,1

Hello! I can build a fully automated pipeline to extract and enrich data from Spanish procurement PDFs. Phase 1: extract product names/features into clean Excel. Phase 2: perform domain-restricted Google searches, extract 5 precise prices per product, brand, features, and similar items using similarity metrics. Deliverables include Excel/JSON datasets and a detailed reproducible README. Best Regards!
$80 USD em 3 dias
4,0
4,0

Do you want the extraction and enrichment pipeline to prioritize speed or maximum accuracy, or a balance of both? I can build a fully automated system to extract product names and features from Spanish PDF procurement documents, enrich them with price and brand data from specific domains, and output structured Excel and JSON datasets. I'm a data engineer experienced in PDF extraction, web scraping, and data enrichment pipelines.I live here in Munich and can be easily reached at any time. Before you pay, discuss each phase of your project step by step with us, whether late evening or early morning. The quality of your task, reliability and, above all, your satisfaction are also important to me. You can even decide for yourself about money, how much you would like to pay. If this goes well, I’m open to long-term collaboration for additional datasets or pipeline enhancements. Let’s get started. Best Regards, Shabeer Khan
$100 USD em 2 dias
3,9
3,9

Hello , I've just reviewed your project description regarding the Spanish Procurement PDF and Data Extraction -- 2 and I'm confident in my ability to meet your expectations. With over 7 years of experience as a Senior Graphic Designer, I possess a strong skill set in Web Scraping, JSON, Data Extraction, Excel, Data Processing, Python, API Integration, Google Search, Natural Language Processing and Data Analysis I kindly request you to take a moment from your busy schedule to explore our portfolio, where you can see the quality of my work and read feedback from previous clients: [Portfolio Links] https://www.freelancer.com/u/afshan2176 Could you please specify the final file formats you'll require? Feel free to award me the project so that we can discuss it further. Looking forward to connecting with you. Best regards, Afshan Z.
$10 USD em 1 dia
3,5
3,5

Hello there, I hope you’re doing great! I’m a professional Python Developer with experience in developing efficient, clean, and reliable Python scripts and applications. Whether it’s data analysis, automation, web scraping, API integration, or backend development — I can handle it with precision and quality. I always focus on writing well-structured, optimized, and bug-free code. My goal is to deliver work that meets your requirements perfectly and adds value to your project. ✅ Clean and optimized Python code ✅ Fast delivery and regular updates ✅ Unlimited revisions until satisfaction ✅ Excellent communication I would love to discuss your project in detail and start right away. Let’s turn your ideas into powerful Python solutions! Best regards,
$10 USD em 1 dia
3,5
3,5

hello, I can build a fully automated, accurate, and reproducible pipeline for Spanish procurement PDFs. I will extract product names and features, clean and structure the data into Excel, then perform domain-restricted Google searches to scrape 5 precise prices per product along with brand, features, and descriptions. Similar products will be identified using cosine similarity and Levenshtein distance. Final outputs will be delivered in Excel and JSON, with a clear README for replication. regards, bakhtawar
$30 USD em 1 dia
3,2
3,2

Hello there,, I have reviewed your requirements and I'm confident that my experience and skills align perfectly with what you're looking for. I'm confident my skills are perfect for the job! As an accomplished AI expert with a strong proficiency in python programming and data processing, I feel confident that I can complete your project brilliantly within the given timeline. My 10+ years of professional experience have equipped me with in-depth knowledge and expertise in various fields of artificial intelligence, especially computer vision, machine learning, deep learning, and image processing -- all of which could greatly benefit your project. Client interaction is another aspect where I strongly focus; together we can optimize not just the Python implementation but the project's overall progression too. Having said that, let's discuss this further and embark on a project journey that exceeds your expectations! Best Regrads.....
$10 USD em 1 dia
3,4
3,4

Hello there, I have carefully read your project description and I’m confident that I can complete your Python project efficiently and professionally. I have strong experience in developing Python scripts for automation, data processing, and problem-solving. Here’s what you can expect from my work: ✅ Clean, optimized, and well-documented Python code ✅ Fast and reliable performance ✅ Regular updates during the project ✅ On-time delivery and unlimited revisions until you are satisfied I’m ready to start right away and can complete the task within your deadline. Please feel free to share more details about your project so I can tailor the solution perfectly for your needs. Looking forward to working with you! Best regards,
$10 USD em 1 dia
3,5
3,5

Hello, I am immediately available to start and can dedicate focused hours to this project. I have built end-to-end PDF extraction and web-scraping pipelines in Python for Spanish documents, delivering Excel and JSON outputs. I will start with Phase 1 to extract product names and features from PDFs, clean and structure the data, then Phase 2 to search domain-restricted results, pull five prices per product, compute similarity, and deliver ready-to-use datasets and a README. If you’re happy to proceed, I can propose a kickoff and align on domain constraints. Best regards, Mojjammil
$100 USD em 1 dia
2,4
2,4

Hello, I’m Oleksandr, a Python and data automation specialist experienced in PDF extraction, web scraping, and structured dataset creation. I can build a fully automatic pipeline that extracts product names and key features from Spanish procurement PDFs, enriches the data via domain-specific web searches, and captures five precise prices per product along with brand, description, and similar products using NLP and similarity measures. The final deliverables will include clean Excel (.xlsx) and JSON (.json) datasets, plus a detailed README explaining the workflow, tools, and reproduction steps for full transparency and reproducibility. Best regards, Oleksandr
$80 USD em 7 dias
2,1
2,1

I appreciate the opportunity to work on your automatic data extraction and enrichment pipeline for Spanish procurement PDFs. Your focus on precise price extraction from the top 5 Google search results, along with similarity analysis using methods like Levenshtein distance, highlights the need for a well-structured and seamless solution that delivers professional, user-friendly datasets in both Excel and JSON formats. I bring expertise in PDF data extraction, web scraping, and leveraging LLMs for enhanced accuracy and efficiency. I may be new to Freelancer, but I bring solid experience to the table. I’m happy to offer a free call to discuss your project further. Regards, Blaze Nicholas
$45 USD em 14 dias
1,2
1,2

Hello, From what you described, the core goal here is to turn unstructured Spanish procurement PDFs into a **reliable, repeatable pricing intelligence dataset**. This isn’t just about extracting text—it’s about producing **accurate, comparable product and price data** that can be trusted for analysis and decision-making. One thing that stands out is the requirement for **five precise prices per product from different URLs**, including similar (not identical) products. That’s where many pipelines fail—PDF noise, inconsistent product naming, and weak similarity logic can quickly degrade accuracy if not handled carefully. The main risk to watch for is **price precision and false matches**, especially when scraping semi-structured e-commerce pages. Without a strong normalization + similarity layer, it’s easy to pull irrelevant prices or miss valid alternatives—particularly in Spanish procurement contexts. Before moving forward, I’d like to confirm one thing: should the domain-restricted Google search be **configurable per run**, or is it a fixed domain set for all products? That affects how the search and enrichment layer should be designed. Happy to discuss and move quickly once that’s clear. Thanks for your time.
$100 USD em 7 dias
0,0
0,0

Hello I use tools like Python (BeautifulSoup, Scrapy, Selenium, Playwright) and Node.js (Puppeteer, Cheerio) to extract data accurately and efficiently. I also handle challenges like rate limiting, CAPTCHAs, user-agent rotation, and proxy integration to ensure the scraper works smoothly and avoids blocks. Whether you need to scrape product listings, pricing data, directories, job postings, social media content, or any other type of information, I can build a tailored solution and deliver the data in formats like JSON, CSV, Excel, or directly into your database or API. I look forward to discussing the project further and providing you with the best possible service. Best regards
$55 USD em 7 dias
0,0
0,0

Hola, soy economista con experiencia en análisis y depuración de datos en el Banco de la República de Colombia. Puedo encargarme de la Fase 1 de extracción, organizando nombres de productos y características clave desde los PDFs en español, y entregarlos en Excel de forma clara y estructurada. Garantizo precisión en la transcripción y limpieza de la información, con entregas parciales para su revisión. Estoy disponible de inmediato y puedo completar esta fase dentro del plazo acordado.
$35 USD em 4 dias
0,0
0,0

Hello, I can build a fully automated Python pipeline for extracting and enriching data from Spanish procurement PDFs. • Extract product names and key features from Spanish PDFs • Clean and structure data into Excel (.xlsx) • Search products on restricted domains and analyze top 5 URLs • Accurately extract prices (5 successful prices per product), brand, and descriptions • Identify similar products using text similarity methods (cosine / Levenshtein) Deliverables will include Excel, JSON outputs, and a clear README for reproducibility. I focus on accuracy, clean code, and reliable scraping. Ready to start immediately. Regards, Harsh
$20 USD em 3 dias
0,0
0,0

I have 8+ years of experience building automated data extraction and enrichment pipelines, including PDF parsing, web scraping, and data normalization. I can extract structured product data from Spanish PDFs, perform domain-restricted Google searches, scrape 5 highly accurate prices per product, and apply similarity matching (Levenshtein / cosine) to identify comparable items. The complete solution will be fully automated, reproducible, and delivered in Excel and JSON formats with a clear README.md. You’re welcome to visit my profile, and we can discuss the project details further via messages. Best regards, Umm E Kalsoom
$45 USD em 1 dia
0,0
0,0

Got it — this reads like an RFP / freelancer brief, so I’ll respond as if I’m pitching a concrete, production-ready solution, not just theory. Below is how I’d design and deliver a fully automatic, reproducible pipeline that hits your accuracy and precision requirements. ✅ Proposed Solution Overview A two-phase automated pipeline that: Extracts structured product data from Spanish procurement PDFs Enriches each product via domain-restricted web search Scrapes 5 high-precision prices per product Identifies similar (not only exact) products using semantic similarity Outputs Excel + JSON, fully reproducible with a clear README
$55 USD em 4 dias
0,0
0,0

Hello, I am a full-stack AI developer rich experience in software development. I am familiar with Python, PDF text extraction, data cleaning, web scraping, Google search automation, price extraction, NLP, LLM integration, similarity analysis using cosine similarity and Levenshtein distance, Excel and JSON data pipelines. I can build a fully automated pipeline to extract structured product data from Spanish PDFs, enrich it via domain-restricted Google searches, scrape and validate five precise prices per product, compute similarity for related items, and deliver reproducible outputs with clear documentation. Thanks
$50 USD em 1 dia
0,0
0,0

Most procurement PDF pipelines fail because they extract noise, not structured data — and then the enrichment step produces wrong prices. I recently built a production-grade AI storybook platform that turns user photos into personalized books, including AI workflows, real-time previews, privacy-first processing, and scalable pipelines. That experience translates directly to your project because it requires the same core strengths: accurate extraction, robust automation, and reliable output. The Three Things That Matter Most 1. Extraction That Works Even When PDFs Are Messy Procurement PDFs are inconsistent and hard to parse. The goal is structured product data, not raw text. The pipeline will extract product names and features, clean and normalize data, output Excel and JSON, use AI-based extraction when needed, and include validation rules. 2. Price Enrichment With 5 High-Quality URLs Per Product The real value is enrichment. The pipeline will use each product name as a domain-restricted Google query, analyze the top 5 results, and extract 5 accurate prices per product, brand, and product description. Similar products will be identified using similarity scoring and validated to avoid false matches. 3. Reproducible and Scalable Pipeline Automated workflow, logs, error handling, retry logic, Excel/JSON output, and a detailed reproduction guide. Quick questions: Do you want results per PDF file or merged into one dataset? How many PDFs per batch?
$65 USD em 7 dias
0,0
0,0

NO SATISFACTION, NO PAYMENT. Incorrect extraction and enrichment can lead to flawed data insights and lost procurement opportunities, which directly impact operational efficiency and decision-making quality. Our experience in building precise, language-specific data pipelines ensures we deliver reliable, high-quality datasets tailored to your needs, enabling confident procurement decisions. Having successfully completed similar projects outside this platform, we offer a carefully considered discounted rate to establish a trusted reputation here—a strategic incentive reflecting our commitment, not our capability. If you’d like, I’m happy to clarify any part of this scope or discuss next steps briefly. Warm regards Liam Jasson
$40 USD em 14 dias
0,0
0,0

Lausanne, Switzerland
Método de pagamento verificado
Membro desde dez. 9, 2025
$10-100 USD
$10-100 USD
$10-100 USD
$10-100 USD
$30-250 USD
$30-250 USD
$10-30 USD
$1500-3000 USD
₹1500-12500 INR
$30-250 USD
₹12500-37500 INR
$30-250 USD
₹600-1500 INR
$30-250 CAD
₹12500-37500 INR
$15-25 USD / hora
₹1500-12500 INR
$250-750 USD
$250-750 AUD
$2-8 USD / hora
£250-750 GBP
£10-15 GBP / hora
₹75000-150000 INR
₹1500-12500 INR