🤖 AI Summary
Although LaTeX compilation often succeeds without syntactic errors, the resulting documents frequently suffer from visual defects—such as figure-text misalignment, equation overflow, inconsistent table scaling, widows and orphans, and page imbalance—that degrade typographic quality and trap authors in inefficient manual debugging cycles. This work formalizes the Visual Typesetting Optimization (VTO) task and introduces a vision-in-the-loop optimization framework that iteratively renders PDFs, detects five key defect categories, and applies constraint-aware source-code editing strategies to automatically transform compilable LaTeX sources into publication-ready documents that adhere to page limits and exhibit high visual fidelity. We construct PaperFit-Bench, a benchmark encompassing 13 defect types, and demonstrate through evaluation on 200 papers across 10 conference templates that our method substantially outperforms existing approaches, bridging the critical automation gap between “compilable” and “publication-ready.”
📝 Abstract
A LaTeX manuscript that compiles without error is not necessarily publication-ready. The resulting PDFs frequently suffer from misplaced floats, overflowing equations, inconsistent table scaling, widow and orphan lines, and poor page balance, forcing authors into repetitive compile-inspect-edit cycles. Rule-based tools are blind to rendered visuals, operating only on source code and log files. Text-only LLMs perform open-loop text editing, unable to predict or verify the two-dimensional layout consequences of their changes. Reliable typesetting optimization therefore requires a visual closed loop with verification after every edit. We formalize this problem as Visual Typesetting Optimization (VTO), the task of transforming a compilable LaTeX paper into a visually polished, page-budget-compliant PDF through iterative visual verification and source-level revision, and introduce a five-category taxonomy of typesetting defects to guide diagnosis. We present PaperFit, a vision-in-the-loop agent that iteratively renders pages, diagnoses defects, and applies constrained repairs. To benchmark VTO, we construct PaperFit-Bench with 200 papers across 10 venue templates and 13 defect types at different difficulty. Extensive experiments show that PaperFit outperforms all baselines by a large margin, establishing that bridging the gap from compilable source to publication-ready PDF requires vision-in-the-loop optimization and that VTO constitutes a critical missing stage in the document automation pipeline.