
Closed
Posted
Paid on delivery
I am building an AI-driven thesis formatting and citation compliance engine for university-level academic manuscripts. The goal is to process a raw thesis manuscript (Microsoft Word or LaTeX) and generate: A fully structured and style-compliant Word version A synchronized LaTeX version A validated and consistent BibTeX database A structured compliance report The system must support dynamic style switching (APA, MLA, Chicago, or department-specific templates) using a rule-based core with optional NLP/LLM enhancement. This is not a simple document converter. It must combine deterministic structural validation with citation integrity checks. Core Functional Requirements 1. Structural Analysis & Normalization Detect and validate: Chapter and section hierarchy Figure and table numbering Appendix placement Caption formatting Restructure document hierarchy when possible Flag structural inconsistencies in a compliance report 2. Citation & BibTeX Validation Validate every in-text citation against the .bib database Detect: Missing entries Unused references Mismatched keys Regenerate reference lists according to selected style Ensure ordering and formatting compliance 3. Style Switching Engine Apply different citation styles (APA, MLA, Chicago, etc.) Support department-specific template configuration Allow rules to be defined via JSON or config file 4. Compliance Report Generation Each run must produce a clean, structured report summarizing: Section hierarchy validation Citation–BibTeX consistency results Reference list completeness Formatting compliance issues Report format: JSON + human-readable summary (PDF or HTML preferred) Deliverables A runnable Python-based script or lightweight desktop application Modular source code with clear comments explaining: Parsing logic Rule-based validation approach Any NLP/LLM components used Sample processed files: Word (.docx) LaTeX (.tex) BibTeX (.bib) that successfully pass: Structural compliance check Citation validation Reference list validation Technical Preferences Python preferred Use of: python-docx (or equivalent) Pandoc (if needed) LaTeX toolchain BibTeX parsing libraries Clean modular architecture Secure file handling (temporary processing only) Important Reliability and correctness are more important than UI design. Please share previous work involving: Document parsing Citation systems Word add-ins LaTeX class/style development Academic publishing tools Milestone-based development is preferred.
Project ID: 40269073
52 proposals
Remote project
Active 14 days ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs