
Open
Posted
•
Ends in 4 hours
Paid on delivery
I have a set of blood-test reports that arrive as PDFs, and I need an accurate, repeatable way to extract only the test result section from each file. The patient demographics and doctor’s notes can be ignored; my focus is strictly on the numerical results, reference ranges, and units. Here’s what I’m looking for: • A lightweight script or small desktop tool (.NET Core + Tesseract, AWS Textract, or any engine you prefer) that ingests multi-page PDF blood panels and returns structured data—CSV or JSON is fine. • Clear mapping of the extracted fields to their respective test names as they appear in the PDF. • Reliability across differing lab layouts; most follow similar tables, but spacing and fonts vary. Acceptance criteria 1. Feed a sample batch of 10 PDFs and receive one consolidated CSV with no missing result values. 2. Numeric accuracy within ±1 of the values shown on the page when spot-checked. 3. Delivery of source code and a brief read-me so I can run it locally on Windows. If you’ve tackled medical OCR before, even better—please mention the toolkit you plan to use and any comparable projects. I’m ready to start as soon as I find the right approach.
Project ID: 40373335
28 proposals
Open for bidding
Remote project
Active 22 hours ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
28 freelancers are bidding on average ₹7,377 INR for this job

Got you, this is less about OCR and more about pulling clean, reliable data from messy PDFs. I’ve worked on a similar kind of problem where unstructured documents had to be converted into structured, validated datasets, and the tricky part was always handling inconsistent formats without losing accuracy . For your case, I’d approach it simply: extract the text → identify the test sections → map values to the right test names → validate so nothing is missed or mismatched. To handle different lab formats, I’d combine direct PDF parsing with OCR fallback (only where needed), and add a small validation layer so the final CSV is complete and accurate. End result will be a simple tool you can run locally on Windows, with clean CSV/JSON output and no dependency headaches. Let’s open a chat, I can take a look at a couple of your sample reports and show you exactly how I’d handle them. Best, Jenifer
₹7,000 INR in 7 days
9.5
9.5

As a seasoned Full-Stack Web Developer with over 7 years of experience, I have successfully handled numerous complex projects that entail data processing, which would come in particularly handy for your OCR Blood Test Extractor project. My proficiency spans across PHP, Python and JSON, key in creating stable, dependable and high-performance solutions. I am well-accustomed to dealing with PDFs using various tools including Tesseract and AWS Textract, allowing me the flexibility to identify the best engine that would work most effectively for your project. Over the years, my forte has evolved into bug fixing, optimization and backend development, ensuring enhanced efficiency and accuracy for processing data from any source. My experience also encompasses web scraping and data mining, further attesting to my aptitude in dealing with varying layouts (fonts) as is expected in your use case. Finally, my dedication to understanding the objectives of every project combined with my commitment to delivering timely solutions will be highly advantageous to you. As a freelancer who values clear communication and maintains an excellent rapport evidenced by consistent 5-star reviews from satisfied clients like yourself, I believe I am the perfect fit for this task! Let's have a chat about your project - I’m eager to help you achieve precise extractions of the blood test results you require!
₹7,000 INR in 7 days
8.8
8.8

Hello I am Software Developer and I have over 25 years of overall experience including working with C#, .NET and Tesseract OCR At first I would like check to sample of PDF to scan, could you share? It would help to evaluate more accurately Thanks.
₹6,750 INR in 4 days
8.2
8.2

Hi there, I will build a lightweight OCR pipeline to extract the numeric test-result tables from multi-page PDF blood panels and map each value to its test name and unit. I’ve delivered similar data-extraction tools and will use a robust engine (Tesseract or AWS Textract) plus layout heuristics to handle varying fonts/spacing. - Script/tool (.NET Core or Python) that ingests multi-page PDFs and outputs consolidated CSV/JSON with test, value, unit, reference range - Field-mapping module that links extracted cells to their visible test names (configurable rules + fallback pattern-matching) - Batch validation: run 10 sample PDFs and produce one consolidated CSV; include unit tests and ±1 numeric spot-check routine - Quality control: staged processing, text-confidence thresholds, fallback OCR engine, and rollback-able config so no data loss Skills: ✅ OCR (Tesseract / AWS Textract) ✅ .NET Core / Python data processing ✅ PDF table parsing / layout heuristics ✅ JSON/CSV export + field mapping ✅ Dockerized local run / Windows-ready delivery Certificates: ✅ Microsoft® Certified: MCSA | MCSE | MCT ✅ cPanel® & WHM Certified CWSA-2 I’m available to start immediately. Do you have representative sample PDFs (different labs) I can test against to tune layout rules and confidence thresholds before final delivery? Best regards,
₹11,000 INR in 1 day
5.9
5.9

Hello Sir, I can build a lightweight, repeatable extractor for blood-test PDFs that focuses only on the results table: test name, value, unit, and reference range. Recommended approach Python + OCR/table extraction pipeline (best balance of accuracy and portability on Windows) Use native PDF text extraction first when possible, with OCR fallback for scanned pages Normalize rows into CSV/JSON Add layout-tolerant parsing so similar lab formats with spacing/font differences still map correctly What I’ll deliver • Script or small local tool runnable on Windows • Consolidated CSV output for batch processing • Source code + short README • Clear field mapping: test name → result → unit → reference range How I’d handle reliability Multi-page PDF support Rule-based parsing for lab tables Validation layer to catch missing/misaligned values Easy tuning if a lab format varies Acceptance fit Batch of 10 PDFs → one consolidated CSV Target numeric accuracy within your tolerance Local Windows execution + source included I’ve worked on structured extraction workflows where OCR alone is not enough—the key is combining text extraction, table parsing, and post-validation so the output is dependable. If you share 2–3 sample PDFs, I can confirm the best extraction strategy quickly.
₹7,000 INR in 4 days
5.8
5.8

Hi, I can build a robust, lightweight Python script using AWS Textract or Azure Form Recognizer to accurately extract numerical results, reference ranges, and units from your blood test PDFs. These cloud-based OCR engines are far superior to open-source tools like Tesseract for handling varied lab layouts, complex tables, and mixed fonts, ensuring high accuracy without extensive manual template tuning. The script will process multi-page PDFs, ignore demographics and notes, and output a clean, consolidated CSV or JSON file with structured data mapped to specific test names. I will implement logic to handle spacing variations and ensure numeric accuracy within your specified tolerance. You will receive the source code, a simple README for local execution on Windows, and a validation report demonstrating successful extraction from your sample batch of 10 PDFs. I have experience with medical document processing and can ensure the solution is repeatable and reliable across different lab formats. I also offer FREE post-delivery support to adjust parsing logic for any new lab templates you encounter, troubleshoot local environment setup, and refine the output structure based on your specific analysis needs. Let's discuss the project in more details.
₹10,000 INR in 3 days
5.8
5.8

Hi, This is a great fit for my experience—I’ve built OCR pipelines for structured document extraction, including cases where layouts vary but the target data (tables, values, units) must be captured accurately. I would build a lightweight Python-based tool using: Tesseract OCR (via pytesseract) for flexibility and local execution Optional fallback: AWS Textract if higher accuracy is needed on complex layouts PDF parsing (pdfplumber / PyMuPDF) to extract text where possible before OCR (improves accuracy) How it will work: Detect and isolate the test results section (table-like regions) Extract: test name, numeric value, unit, reference range Normalize and map fields consistently across different lab formats Output a clean CSV (and optional JSON) with one row per test Handling layout variations: Use pattern-based parsing (regex + structure detection) Build a flexible mapping layer so slightly different formats still align correctly Add validation checks to avoid missing or misaligned values Deliverables: Working script/tool (Windows-ready) Consolidated CSV output from batch PDFs Source code + clear README Demo run on your sample files If needed, I can also make the tool easily extendable for new lab formats. I can start quickly—if you share 2–3 sample PDFs, I’ll confirm the best extraction approach before full implementation. Best regards, Doan
₹7,000 INR in 2 days
5.8
5.8

Hi, this is a strong fit for my workflow. I’d build a lightweight Windows-ready extractor that pulls only the blood-test result section from multi-page PDFs and outputs structured CSV/JSON. My preferred approach is hybrid: PDF-native parsing first for accuracy, with OCR fallback (Tesseract) only on scanned pages. This is usually more reliable than OCR alone across varying lab layouts. I’d map each test name to its result, unit, and reference range, then consolidate a batch of 10 PDFs into one CSV. Delivery includes full source code, README, and local Windows run instructions. I focus on repeatable extraction pipelines with validation rules to reduce missing fields and keep numeric accuracy within your spot-check tolerance.
₹5,000 INR in 1 day
5.7
5.7

Hi, thanks for the detailed description — this is a solid use case. Before I propose an approach, I want to align expectations because medical PDF extraction can vary quite a bit depending on layout consistency. A couple of quick questions: 1. For your end goal, would you prefer a simple script/CLI tool (Python/.NET) that outputs CSV/JSON, or are you expecting a full desktop application with UI (Electron/Tauri-style)? 2. Are the PDFs all from a single lab format, or do you expect multiple labs with different layouts and table structures? 3. Is accuracy based on best-effort extraction acceptable, or do you need strict validation rules for every field (e.g., no missing values allowed)? Reason I ask is that the complexity mainly comes from OCR + table structure normalization rather than the interface itself, so the scope can vary significantly depending on these answers. Once I have this, I can suggest the most reliable and cost-effective approach for your budget.
₹11,000 INR in 7 days
5.7
5.7

HI there already i have checked project details job is clear so please contact me then i will show you sample, thank you
₹2,000 INR in 1 day
5.3
5.3

Hello, I can build a reliable OCR-based solution to extract structured blood test results from PDFs with high accuracy and consistency. Approach: * Use Python + AWS Textract / Tesseract (.NET Core optional) for robust OCR * Intelligent parsing to isolate only test tables (ignore demographics/notes) * Map test names, values, units & reference ranges into clean CSV/JSON * Handle multi-layout PDFs using pattern recognition + fallback rules Batch processing for multiple files → single consolidated CSV Expertise: Python, .NET, .NET Core, Data Processing, Software Development, Web Scraping, AWS, PHP You will get source code, setup guide, and a ready-to-run Windows solution. Regards, Nitin Johnson
₹15,000 INR in 7 days
4.7
4.7

I read your project requirements and would be thrilled to collaborate with you. With expertise in Data Extraction using Python, I specialize in navigating complex data structures and deliver efficient results and scalable solutions. Let’s connect to discuss further
₹5,000 INR in 2 days
4.2
4.2

As a seasoned full-stack developer, my experience extends beyond the boundaries of mobile development and includes some of the very databases, automation systems and data processing skills you require for this project. With an abundance of technical prowess in Python and PHP, rest assured that I can engineer an OCR tool employing cutting-edge techniques tailored to your specific needs. I have successfully developed many data-driven applications with meticulous attention to accuracy and reliability. For example, I engineered a proprietary payment system for a leading pharmaceutical company that processed vast data sets while ensuring precision and security. I see similarities between this project's need for reliable extraction of numerical test results from variously formatted lab layouts and the intricate parsing my system needed to handle—and I believe my problem-solving capacity will be invaluable here. To guarantee you even more confidence in my ability to deliver, let's initiate a parallel effort on sample documents as a benchmark for accuracy before formally committing. Remember, not only will I provide you with the final product and its source codes proficiently but also train you on how to execute it locally on Windows. Let's create an innovative solution together!
₹12,500 INR in 2 days
4.0
4.0

Hi, I can develop an OCR-based blood test extractor that accurately reads and digitizes medical reports, structures the extracted data, and ensures reliable and efficient results processing. Let’s build a smart and accurate solution for your requirements! Best regards, Waleed Saleem
₹3,000 INR in 2 days
2.5
2.5

I understand you need a reliable way to extract numerical blood test results from PDFs while ignoring demographics and notes, with structured CSV/JSON output that maps fields accurately across varying lab layouts. I'll build a Python solution using Tesseract OCR with PDF2image preprocessing to handle multi-page PDFs reliably. The pipeline converts each page to images, extracts table data using custom layout analysis, and outputs consolidated CSV with proper field mapping while handling varying lab formats. With a decade in digital automation and full-stack development, I've delivered similar data extraction scripts for structured medical reports. Ready to start at ₹6,250 INR — let's discuss your requirements.
₹6,250 INR in 7 days
2.1
2.1

Hi there With over 5 years of experience in Python, OCR, and data extraction, I can absolutely build a reliable tool to extract blood-test results from PDFs. I will deliver a lightweight script that ingests multi-page PDF blood panels and outputs structured data in CSV or JSON format, focusing strictly on numerical results, reference ranges, and units. I will map extracted fields to their respective test names as they appear in the PDFs, ensuring accuracy within ±1 of the values. The solution will handle varying lab layouts with differing spacing and fonts, providing a consistent, repeatable extraction process. I will deliver fully commented source code and a brief read-me for running the tool locally on Windows. I plan to use Python with Tesseract OCR for efficient and precise extraction based on previous experience with medical reports. Looking forward to working with you. Ihor Zhuravel
₹5,500 INR in 2 days
1.7
1.7

You’ve scoped this well—the real challenge is not plain OCR, but reliably isolating the results table across slightly different lab layouts and mapping each value to the correct test name, unit, and reference range. I can help build a lightweight extraction workflow that processes multi-page PDF reports, ignores demographics/notes, and outputs structured CSV or JSON ready for review. For this type of job, I’d typically use an OCR + table-parsing approach with layout rules and validation checks so numeric fields, units, and ranges stay aligned even when spacing shifts between labs. I can also provide source code and a short Windows-friendly readme so you can run the process locally on future batches. Starting with a 10-PDF sample is the right way to validate accuracy before scaling.
₹12,000 INR in 4 days
1.1
1.1

Hi. This sits right in my lane: practical automation, clean data handling, and delivery that works without a lot of babysitting. I can start immediately, keep the first version lean, and share progress early so you can steer before too much time is spent in the wrong direction. What does success look like for the first usable version?
₹7,400 INR in 3 days
0.0
0.0

‼️ONLY PAY WHEN YOU’RE 100% HAPPY‼️ I see you need precise extraction of numerical test results from varied blood-test PDFs, focusing on accuracy and structured output. To tackle this, I’ll build a lightweight .NET Core tool using Tesseract OCR paired with custom parsing logic tailored to consistently identify tables regardless of layout differences. While I’m new to Freelancer, I’ve done similar medical OCR projects off-platform involving lab reports and ensured numeric accuracy and field mapping as tight as ±1. I deliver clean, reliable CSV exports plus full source code with clear instructions. Let’s chat! Worst case, you get a free consultation and real insight. Regards Pietie Lubbe
₹9,400 INR in 14 days
0.0
0.0

Hello, Extracting structured test results from blood panel PDFs — test names, numerical values, reference ranges, and units — into a clean CSV or JSON is exactly the kind of OCR pipeline I build with Python. My approach: Use pdfplumber for text-based PDFs (fast and accurate) with pytesseract as fallback for scanned pages Map extracted fields to test names as they appear in each PDF — consistent column headers regardless of lab layout Handle varying table structures and fonts across different lab formats Deliver one consolidated CSV from a batch of PDFs, with source code and a clear README so you can run it locally on Windows To show you the quality upfront, send me 2–3 sample PDFs and I'll return the extracted CSV for free — you verify accuracy before committing to anything. Delivery for the full batch tool: 3 days after receiving sample files. Best regards, Marcos
₹4,000 INR in 1 day
0.0
0.0

Ahmedabad, India
Payment method verified
Member since Dec 7, 2014
₹1500-12500 INR
₹1500-12500 INR
₹1250 INR
₹1500-12500 INR
₹12500-37500 INR
$15-25 USD / hour
₹600-1500 INR
₹1500-12500 INR
$250-750 CAD
₹100-400 INR / hour
₹1500-12500 INR
₹400-750 INR / hour
₹750-1250 INR / hour
₹1500-12500 INR
₹750-1250 INR / hour
₹1500-12500 INR
$250-750 USD
€5000-10000 EUR
₹1500-12500 INR
£250-750 GBP
₹1500-12500 INR
$1500-3000 USD
$10-30 USD
$10-30 USD
€12-18 EUR / hour