
Fechado
Publicado
Pago na entrega
I’m building a fully automated publishing pipeline that turns raw manuscript files into polished, publication-ready books. The system must ingest HTML, Markdown, and plain TXT, detect any structural metadata already present, then export perfectly styled EPUB by default, with optional DOCX and press-quality PDF versions generated in the same run. Advanced styling is essential: the converter should apply layout templates that control typography, front-matter placement, page geometry, and embedded media rules. I need the flexibility to swap or extend these templates later without rewriting the core pipeline, so clean separation between content conversion and styling logic is critical. I’m open to the tooling you prefer—Pandoc, Calibre, PrinceXML, custom Python or Node transformers, containerised micro-services, or a blend—as long as the finished workflow scales easily on a build server and can be triggered via CLI or REST. Deliverables • Source code and build scripts for the complete conversion pipeline • At least two example templates demonstrating advanced styling features • Documentation covering installation, configuration, and how to add new formats or templates • A short test suite proving that all three input types successfully produce valid EPUB, DOCX, and PDF outputs If this sounds like your kind of challenge, let’s talk timelines and the best technological path forward. PROJECT TITLE AI-Based eBook Creation & Conversion System (OCR + EPUB + AI Processing) --- 1. PROJECT OVERVIEW We are developing a scalable automated publishing system that converts multiple input formats into publication-ready EPUB books and optionally print-ready formats (DOCX/PDF). The system will: Process files in batch Maintain formatting (including tables, figures, equations) Use AI for content generation and rewriting Automatically generate book structure (Title Page, Preface, etc.) --- 2. PROJECT OBJECTIVE To build a modular, scalable, and configurable system that: 1. Converts: Scanned files (OCR) PDF HTML Word (DOCX) → into EPUB 2. Converts: EPUB → Word/PDF (print-ready) 3. Automatically generates: Title Page Copyright Page Preface Acknowledgement Table of Contents (for print output) --- 3. INPUT TYPES** A. Scanned Files OCR required Output must be editable and structured Formatting must be preserved as much as possible --- B. PDF Files Detect: Scanned vs digital Maintain: Headings Tables Layout --- C. HTML Files Direct conversion to EPUB Preserve formatting --- D. Word Files (DOCX) Convert to EPUB with formatting intact --- E. EPUB Files Convert to Word (print-ready) Generate TOC and optional Index --- 4. CORE FEATURES (MVP SCOPE)** 4.1 Batch Processing Upload multiple files Process via queue system --- 4.2 Excel-Based Metadata Input System must read Excel file Must support: Dynamic column mapping (NO hardcoding) Missing field handling --- 4.3 AI-Generated Content System must generate: Book Title (based on article titles) Preface Acknowledgement --- 4.4 AI Rewriting Feature Expand or reduce content: ±10%, 25%, 40%, 60%, 80%, 100% Must: Preserve structure Avoid plagiarism Not modify equations/tables layout --- 4.5 Table Formatting (MANDATORY) All tables must: Have grid borders Use hairline thickness (~0.25 pt) Must work in: EPUB Word PDF --- 4.6 Book Structure Generation Final EPUB must include: 1. Title Page (AI-generated title) 2. Copyright Page (template-based) 3. Preface (AI-generated) 4. Acknowledgement (AI-generated) 5. Table of Contents 6. Chapters (articles) --- 5. IMPORTANT CONTENT RULES Author Names Only names allowed NO: Designations Institutions Affiliations --- Copyright Page Template will be provided System must replace variables: ISBN eISBN Year Publisher Name Address Email --- Title Page Title → AI-generated Editor/Author Name → provided via Excel --- 6. TECHNICAL REQUIREMENTS** Preferred Stack Backend: Python (FastAPI preferred) OCR: Tesseract Conversion: Pandoc Calibre --- Architecture (MANDATORY) The system MUST be: 1. Modular Separate components: OCR Conversion AI processing Output generation --- 2. Config-Driven No hardcoding of: Excel columns Templates Prompts --- 3. Scalable Must support: Batch processing Future API integration Multi-user expansion --- 4. Replaceable Components OCR engine should be replaceable AI provider should be replaceable --- 7. UI REQUIREMENTS (BASIC) Simple interface: Upload files Upload Excel Select options: Rewrite % Generate content (yes/no) Download output --- 8. OUTPUT REQUIREMENTS EPUB Clean structure Compatible with major readers --- Word (DOCX) Print-ready Includes: TOC Proper formatting --- 9. ERROR HANDLING** System must: Skip problematic files (log errors) Continue batch processing Provide error report --- 10. PERFORMANCE REQUIREMENT** Must handle: Minimum 50–100 files per batch Should not crash on large files --- 11. DELIVERABLES Developer must provide: 1. Working application 2. Source code (fully commented) 3. Documentation: Setup instructions Config guide 4. Sample outputs --- 12. MANDATORY DEVELOPMENT CONDITIONS (VERY IMPORTANT) The developer MUST: NOT hardcode: Excel structure Templates Prompts Build system so that: Fields can be changed without code edits Prompts can be modified easily Templates can be replaced --- Code Requirements: Clean and readable Modular Future scalable --- 13. PROJECT PHASES Phase 1 (MVP) OCR + Conversion + Basic AI EPUB output --- Phase 2 (Later) Advanced indexing UI improvements Multi-language support --- 14. TIMELINE MVP: 4–6 weeks --- 15. BUDGET * Open to proposals (cost-effective preferred) * Milestone-based payment --- 16. APPLICATION REQUIREMENTS Please include: 1. Relevant experience (OCR / EPUB / document processing) 2. Tools you will use 3. Timeline 4. Cost breakdown 5. Sample work (MANDATORY) --- 17. SELECTION PROCESS Shortlisting Paid test task Final selection --- 18. IMPORTANT NOTE We are looking for a long-term developer. This project will expand significantly. ---
ID do Projeto: 40349120
36 propostas
Projeto remoto
Ativo há 11 dias
Defina seu orçamento e seu prazo
Seja pago pelo seu trabalho
Descreva sua proposta
É grátis para se inscrever e fazer ofertas em trabalhos
36 freelancers estão ofertando em média ₹111.414 INR for esse trabalho

Hi, Your vision for a scalable, AI-powered publishing pipeline is clear and well-structured, and I’d love to help build this end-to-end system. I’ve reviewed your requirements in detail and can deliver a modular, config-driven architecture that supports OCR, multi-format ingestion, AI processing, and high-quality EPUB/DOCX/PDF outputs. I propose a Python-based backend using FastAPI, with Pandoc + Calibre for conversion and Tesseract (replaceable) for OCR. The system will be designed as loosely coupled modules—OCR, parsing, AI processing, templating, and export—ensuring flexibility and future scalability. Key highlights of my approach: Template-driven styling (CSS + layout configs) fully decoupled from conversion logic Dynamic Excel metadata mapping (no hardcoding) AI integration for title, preface, acknowledgements, and controlled rewriting Robust table rendering (hairline borders, EPUB/PDF/Word compatible) Batch queue processing (Celery/Redis or equivalent) with fault tolerance CLI + REST API support for automation and scaling Deliverables will include clean, documented source code, reusable templates, test coverage for all input/output paths, and deployment-ready scripts (Docker supported). Estimated timeline for MVP: 1 weeks, aligned with your plan. I’m also open to a paid test task and long-term collaboration as the system evolves. Looking forward to discussing the architecture and next steps. Best regards,
₹100.000 INR em 7 dias
6,1
6,1

Your project calls for a modular, scalable pipeline that handles OCR, multi-format conversion, AI content generation, and robust batch processing without rigid dependencies on input structures. I've built similar systems that take scanned and digital files through OCR and Pandoc-driven transformations, producing clean EPUB and print-ready PDF/Word with accurate tables and structured output. To meet your config-driven needs, I suggest a plugin-style architecture where Excel mapping, AI prompts, and templates live in separate JSON/YAML files. This way, you can swap or extend components without touching code. For AI rewriting and generation, integrating a service via API calls lets you control prompt updates dynamically. Logging and error handling for batch jobs ensured smooth, resilient runs handling over 100 files in past projects. A quick question: Should OCR text accuracy go through automatic correction or manual review? And for AI rewriting, do you prefer open-source models or cloud-based APIs for better reliability and scale? I can deliver the MVP with all core features within 5 weeks, providing fully documented source code, sample templates, and tests. Ready to begin and discuss the best tech stack choice based on your infrastructure.
₹112.500 INR em 7 dias
5,8
5,8

Hi, I am an IIT Grad, PMP Certified Professional, ex-BFSI and worked at fortune 500 companies. I will make it a reality for you. As a Automated Multi Format Book Converter, I will utilize Python as the primary programming language to develop a modular, template-based architecture that leverages libraries like PyPDF2, docxtemplater, and ebooklib to automate book formatting across various formats. Kindly click on the chat button so we can discuss and get started. Will share you my prior projects done and my resume too. I have been doing freelancing since 2019 worked at top MNCs in both USA and India. Lets connect
₹75.000 INR em 7 dias
5,3
5,3

You’re building a complex, scalable publishing pipeline with OCR, AI processing, and multi-format conversion, and I understand the priority is a modular, config-driven system that avoids hardcoding and can evolve long-term. I would build this using FastAPI (Python) with a microservice-style architecture separating OCR, conversion, AI processing, and output generation. For conversion, I’ll rely on Pandoc as the core engine, extended with custom filters for structure detection and styling, while Tesseract handles OCR and feeds clean, structured text into the pipeline. The system will support batch processing via a queue (Celery/Redis), dynamic Excel metadata mapping, and template-driven styling (EPUB, DOCX, PDF) with strict separation between content and layout. AI modules will handle title generation, preface, acknowledgements, and controlled rewriting while preserving tables, equations, and structure. Templates, prompts, and mappings will be fully configurable (YAML/JSON), ensuring zero hardcoding and easy future updates. Outputs will include clean EPUB (default), plus print-ready DOCX/PDF with consistent formatting, TOC, and book structure. Timeline for MVP is 4–6 weeks with milestone-based delivery: core pipeline, AI integration, batch system, and testing suite. I can also support long-term scaling and future feature expansion. Syed
₹109.850 INR em 6 dias
4,7
4,7

**Hello,** I’m **Karthik**, a Full Stack Architect with 15+ yrs experience in document processing, OCR pipelines, and automated publishing systems. **Understanding:** You need a **modular, scalable multi-format book conversion system** with OCR, AI processing, and template-driven EPUB/DOCX/PDF output . **Approach:** • Build pipeline (Python + FastAPI + Celery for batch processing) • OCR module (Tesseract, replaceable engine design) • Conversion layer (Pandoc + Calibre) for EPUB/DOCX/PDF • AI module (LLM-based content generation + rewriting with structure preservation) • Template engine (separate styling layer for typography, layouts) • Metadata ingestion (Excel with dynamic mapping, no hardcoding) • Queue-based processing for 50–100+ files per batch • CLI + REST API trigger support **Architecture:** • Fully modular (OCR / AI / Conversion / Output layers) • Config-driven (templates, prompts, mappings editable) • Dockerized for scalable deployment **Deliverables:** • Complete automation pipeline • 2+ advanced styling templates • Test suite + documentation • Sample outputs (EPUB/DOCX/PDF) **Why Me:** • Strong in OCR, document parsing & automation pipelines • Experience with Pandoc/AI workflows • Clean, scalable, production-ready architecture **Timeline:** 4–6 weeks (MVP) Happy to share similar automation systems & approach. **Warm Regards,** Karthik B Resonite Technologies
₹152.500 INR em 7 dias
5,1
5,1

With my extensive experience in web and mobile development, I’m confident that I can develop the versatile and scalable automated publishing system that you need for your project. My proficiency in HTML, Javascript, and Node.js aligns seamlessly with your preferred stack. I have a solid understanding of containerized microservices and REST API architecture, enabling me to design a solution that will scale effortlessly on a build server while also being easily triggered via CLI or REST. Moreover, my previous experience with OCR technologies perfectly complements the content conversion aspect of this project. I understand the complexities of handling scanned files and the intricate balance required to retain formatting while still allowing for AI-assisted content generation and rewriting. As a meticulous developer, I can guarantee clean separation between content conversion and styling logic—an essential requirement for adaptable template swapping as your project continues to evolve.
₹112.500 INR em 7 dias
5,0
5,0

Dear Hiring Manager, I have carefully reviewed your requirements for the automated publishing pipeline and I am confident in delivering a scalable, modular, and production-ready solution. My approach will focus on building a clean separation between ingestion, processing, AI services, and output generation layers, ensuring that each component (OCR, conversion, templating, and AI processing) remains replaceable and easily extendable. I will design a config-driven architecture where templates, prompts, and mappings are externalized, allowing you to modify behavior without changing core code. The pipeline will leverage tools such as Python (FastAPI), Pandoc, Calibre, and Tesseract OCR, orchestrated through a CLI/REST-based workflow with support for batch processing, queue handling, and structured logging. Special attention will be given to preserving formatting (tables, layouts, metadata) and generating consistent outputs across EPUB, DOCX, and PDF formats. Before starting, I have a few quick questions: • Are there any preferred tools or should I proceed with the proposed stack? • Do you already have templates for EPUB/DOCX/PDF or should I design them? • Which AI provider should be integrated for content generation? • What is your expected timeline for the MVP and full system? I am ready to start your project immediately and look forward to working with you. Best Regards, JP
₹75.000 INR em 7 dias
3,6
3,6

I understand that you are looking to build a sophisticated automated publishing pipeline capable of converting various manuscript formats into polished, publication-ready eBooks. Your requirements for maintaining formatting across multiple input types like HTML, Markdown, and TXT while generating outputs in EPUB, DOCX, and PDF demonstrate a clear need for modularity and scalability. With over 12 years of experience in developing robust applications using technologies such as Python for OCR integration with Tesseract, and Pandoc for seamless document conversion, I can design an efficient workflow tailored to your specifications. I also have expertise in creating config-driven systems which will allow future enhancements without code changes. Could you clarify if there are specific metadata formats or templates that need to be integrated from the start? This will help ensure that our approach aligns perfectly with your vision for the project.
₹150.000 INR em 7 dias
2,4
2,4

we can build your modular OCR-to-EPUB system with advanced templating, batch processing, and AI content generation.
₹112.500 INR em 45 dias
2,6
2,6

Hi, this aligns well with my experience in building modular document-processing and AI pipelines, and I can deliver a scalable, config-driven system as required . I’ll build the backend using Python (FastAPI) with clearly separated modules: OCR (Tesseract), conversion (Pandoc/Calibre), AI processing, and output generation. The system will be fully config-driven (no hardcoding)—Excel mappings, templates, and prompts will be editable without code changes. For performance, I’ll implement batch processing (50–100 files) using a queue system with proper error handling and logging. The pipeline will support PDF, DOCX, HTML, OCR inputs → EPUB, with optional DOCX/PDF output, ensuring formatting preservation (tables, headings, equations). AI features will include auto-generation (title, preface, acknowledgment) and controlled rewriting with % options while preserving structure. You’ll receive clean code, templates, CLI/API support, documentation, and test cases.
₹112.500 INR em 7 dias
2,3
2,3

Hello, I carefully reviewed your job description and am happy to share with you that I have all the skills set that you looking for. You are building a scalable, AI-powered publishing pipeline that handles OCR, multi-format conversion (HTML, DOCX, PDF, EPUB), batch processing, and advanced template-based styling while keeping everything modular, configurable, and production-ready. I clearly understand your expectation for clean architecture (separate OCR, conversion, AI modules), no hardcoding, flexible templates, and high-performance batch processing. With 12 years of experience in Python (FastAPI), document processing (Pandoc, Tesseract, Calibre), and scalable systems, I can deliver a robust MVP aligned with your requirements. I will personally oversee your project along with my team. Can we connect regarding this? I can complete this MVP within 4–6 weeks. Regards, Murtuza
₹112.500 INR em 36 dias
1,5
1,5

With my extensive experience in web platform development and particular expertise in leveraging AI, I am the ideal choice for your Automated Multi-Format Book Converter project. Over time, I have honed my skills in AI Development, HTML, JavaScript, Node.js and Python – all tools that are incredibly aligned to this project's needs for dealing with input files of various formats and efficiently converting them into publication-ready EPUB books. In addition to delivering robust solutions optimized for scalability, my unique value proposition lies in my ability to understand the need for clean separation between content conversion and styling logic. This is critical for potential future changes or extensions to your conversion templates, as I ensure minimal disruption to your core pipeline while keeping it readily adaptable. Choosing me would mean not only entrusting your project with someone technically adept, but also with a freelancer intensely focused on long-term collaboration over one-off tasks. Being committed to delivering reliable execution, clear communication at every step, and a flexible yet strategically-rooted roadmap for the growth of your product would ensure an excellent professional relationship. Let's get started and create a highly efficient system that enhances your publishing pipeline!
₹112.500 INR em 7 dias
0,3
0,3

I saw your project and am confident I can deliver on this. I'm currently working on a similar project and understand the importance of maintaining formatting while converting multiple input formats into publication-ready books. With a focus on advanced styling and layout templates, I assure you that the system I develop will seamlessly handle the conversion process, ensuring the delivery of perfectly styled EPUB, DOCX, and PDF versions. This project aligns perfectly with my expertise, and I am excited to tackle the challenge head-on, providing you with a top-notch automated publishing pipeline. I invite you to view my portfolio, which showcases the quality and results of my past work. My experience in developing scalable systems and expertise in handling various input formats make me the ideal candidate for this project. I am confident that my skills will meet and exceed your expectations, delivering a solution that aligns perfectly with your project requirements. I look forward to hearing from you. Regards, Sadiya
₹75.000 INR em 7 dias
0,0
0,0

I propose to build a fully modular, scalable AI-powered publishing pipeline using FastAPI, Pandoc, and Tesseract OCR. The system will support multiple input formats (PDF, DOCX, HTML, scanned files) and convert them into structured, publication-ready EPUB, with optional DOCX and print-quality PDF outputs. The architecture will be completely config-driven — ensuring no hardcoding of Excel mappings, templates, or AI prompts. I will implement batch processing with a queue system, robust error handling, and dynamic template-based styling for typography, layouts, and front matter. AI features will include automated title generation, preface, acknowledgements, and controlled rewriting while preserving tables, equations, and document structure. The system will be designed for scalability, with replaceable components (OCR, AI providers) and support for CLI + REST API integration. Deliverables will include clean, well-documented source code, sample templates, test cases, and complete setup/configuration documentation.
₹112.500 INR em 7 dias
0,0
0,0

I Will Develop Professional Website or Custom Software Solutions ?? Gig Description Are you looking for a reliable software developer to bring your idea to life? You’re in the right place! I am a professional software developer with experience in building high-quality, scalable, and user-friendly applications. Whether you need a website, web app, or custom software, I will deliver clean and efficient solutions tailored to your needs. ? What I Offer: ? Website Development (Business, Portfolio, E-commerce) ⚙️ Custom Software Development ? Web Applications (React, Node.js, etc.) ?️ Bug Fixing & Code Optimization
₹112.500 INR em 15 dias
0,0
0,0

Building an AI-based eBook creation and conversion system is right up my alley. With extensive skills in HTML, JavaScript, Node.js, and Python, I'm equipped to tackle the challenge of processing different input file formats like Scanned Files, PDFs, HTML, Word Files (DOCX), and EPUBs. Also, being entrusted with various projects over the years, scalability and modularity have become second nature to me. One area where I believe I can really make a difference to this project is in content generation and rewriting. As we'll be using AI for generating the book's title, preface, and acknowledgement sections, I can bring my experience in AI-powered content development to the table. Not only have I used AI for content creation in the past, but I've also implemented rigorous checks to ensure that generated content is plagiarism-free. Moreover, as a student of Computer Science with a constant desire to learn and innovate, I'm open-minded towards adopting the best tools for this project. Whether using Pandoc, Calibre, PrinceXML or any other technology that suits your unique workflow requirements – my aim is always to deliver exceptional results that cater perfectly to each client's needs. Let's join forces and create a transformative eBook conversion system together!
₹120.000 INR em 3 dias
0,0
0,0

I have a keen eye for clean separation between content conversion and styling logic- critical for swapping or extending templates without rewriting the core pipeline. My previous work on creating templates for publishing which included advanced features such as embedded media rules will be immensely beneficial here. I'm excited about this project Titus Wirch! Let me use my skills and experience to build an ebook creation and conversion system that'll exceed your expectations!
₹112.500 INR em 15 dias
0,0
0,0

Dear Client, Greetings from Resonite Technologies! We have proven experience in OCR, document processing, AI workflows, EPUB/PDF/DOCX generation, and scalable Python systems. Your publishing pipeline requirement matches our strengths well. Our proposed stack: FastAPI + Python, Tesseract OCR, Pandoc/Calibre, queue-based batch processing, config-driven templates/prompts, and modular services for OCR, conversion, AI processing, metadata mapping, and output generation. What we will deliver: • Automated pipeline for HTML/Markdown/TXT/PDF/DOCX/Scanned files → EPUB • Optional EPUB → DOCX/PDF generation • Excel metadata import with dynamic column mapping • AI-generated Title Page, Preface, Acknowledgement, TOC-ready structure • AI rewrite controls with configurable expansion/reduction • Strong table preservation with required borders/styling • Error logging, skip-and-continue batch handling, and sample templates • Basic UI for upload, options, processing, and download Why us: ✔ No hardcoding of Excel structure, prompts, or templates ✔ Clean, modular, future-scalable architecture ✔ Ready for batch processing, REST API expansion, and multi-user growth ✔ Fully documented code with test coverage and sample outputs Timeline: MVP in 4–6 weeks Milestones: Architecture → Core pipeline → AI/output modules → QA & handover We can share relevant document automation experience and are open to a paid test task. Warm Regards, Karthik B Resonite Technologies
₹149.500 INR em 7 dias
0,0
0,0

I build automated publishing pipelines daily — PDF, EPUB, MOBI, DOCX conversion using Pandoc, Calibre, and custom Python orchestration with AI for content processing. For your multi-format book converter I will deliver: 1. Input parser handling DOCX, Markdown, PDF source formats 2. Conversion engine outputting clean EPUB, MOBI, PDF, and print-ready files 3. AI layer using GPT-4o for metadata extraction, chapter detection, and TOC generation 4. Cover image and layout templates per format 5. Web interface or CLI to process new books in batch 6. Quality validation step before final output Ready to start with a working prototype this week. What are the source formats you receive and how many books per month?
₹75.000 INR em 10 dias
0,0
0,0

Hello Client, I’ll deliver your AI-Based eBook Creation & Conversion System with precision and efficiency, ensuring smooth, error-free batch processing of OCR, HTML, Markdown, TXT, PDF, and DOCX files into publication-ready EPUB, DOCX, and PDF outputs. You’ll receive modular source code, configurable templates, and comprehensive documentation alongside a reliable test suite validating all input types. My approach emphasizes clean separation of content and styling logic to enable easy template swaps and scalability on build servers via CLI or REST. With proven experience in OCR, Pandoc, and FastAPI, I’m ready to start immediately and communicate professionally throughout. Let’s connect today and advance your project confidently. Regards, Anton Prinsloo
₹100.000 INR em 30 dias
0,0
0,0

New Delhi, India
Método de pagamento verificado
Membro desde mar. 21, 2026
₹12500-37500 INR
₹12500-37500 INR
₹12500-37500 INR
₹750-1250 INR / hora
₹750-1250 INR / hora
₹12500-37500 INR
₹750-1250 INR / hora
$15-25 USD / hora
$30-250 USD
₹12500-37500 INR
$30-250 USD
₹12500-37500 INR
$250-750 USD
₹12500-37500 INR
$15-25 USD / hora
₹750-1250 INR / hora
₹12500-37500 INR
$30-250 USD
$10-50 USD
$15-25 USD / hora
$15-25 USD / hora
₹750-1250 INR / hora
$30-250 USD