FocalOrder: Focal Preference Optimization for Reading Order Detection

📅 2026-01-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a critical limitation in existing reading order detection methods, which apply uniform supervision and overlook the inherent “positional disparity” in documents—where middle regions are significantly harder to model than beginning or ending segments—leading to degraded performance on complex layouts. To tackle this, we propose FocalOrder, the first framework to explicitly reveal and model this phenomenon. Our approach introduces Focal Preference Optimization (FPO) to dynamically identify challenging sequence transitions, combined with an exponential moving average mechanism and a difficulty-calibrated pairwise ranking loss to enhance global logical consistency. Evaluated on OmniDocBench v1.0 and Comp-HRDoc, FocalOrder achieves new state-of-the-art results; notably, its lightweight variant not only outperforms specialized baselines but also substantially surpasses large-scale general-purpose vision-language models.

Technology Category

Application Category

📝 Abstract
Reading order detection is the foundation of document understanding. Most existing methods rely on uniform supervision, implicitly assuming a constant difficulty distribution across layout regions. In this work, we challenge this assumption by revealing a critical flaw: \textbf{Positional Disparity}, a phenomenon where models demonstrate mastery over the deterministic start and end regions but suffer a performance collapse in the complex intermediate sections. This degradation arises because standard training allows the massive volume of easy patterns to drown out the learning signals from difficult layouts. To address this, we propose \textbf{FocalOrder}, a framework driven by \textbf{Focal Preference Optimization (FPO)}. Specifically, FocalOrder employs adaptive difficulty discovery with exponential moving average mechanism to dynamically pinpoint hard-to-learn transitions, while introducing a difficulty-calibrated pairwise ranking objective to enforce global logical consistency. Extensive experiments demonstrate that FocalOrder establishes new state-of-the-art results on OmniDocBench v1.0 and Comp-HRDoc. Our compact model not only outperforms competitive specialized baselines but also significantly surpasses large-scale general VLMs. These results demonstrate that aligning the optimization with intrinsic structural ambiguity of documents is critical for mastering complex document structures.
Problem

Research questions and friction points this paper is trying to address.

Reading Order Detection
Positional Disparity
Document Understanding
Layout Analysis
Structural Ambiguity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Focal Preference Optimization
Reading Order Detection
Positional Disparity
Adaptive Difficulty Discovery
Pairwise Ranking
🔎 Similar Papers
No similar papers found.
F
Fuyuan Liu
Unisound AI Technology Co.Ltd
D
Dianyu Yu
Unisound AI Technology Co.Ltd, Beihang University
He Ren
He Ren
Applied Materials, Inc
N
Nayu Liu
School of Computer Science and Technology, Tiangong University
X
Xiaomian Kang
MAIS, Institute of Automation, Chinese Academy of Sciences
D
Delai Qiu
Unisound AI Technology Co.Ltd
Fa Zhang
Fa Zhang
Professor,Beijing Institute Technology
Bioinformatics;Bio-Medical Image Processing; High Performance Computing
G
Genpeng Zhen
Unisound AI Technology Co.Ltd
S
Shengping Liu
Unisound AI Technology Co.Ltd
J
Jiaen Liang
Unisound AI Technology Co.Ltd
W
Wei Huang
Unisound AI Technology Co.Ltd
Yining Wang
Yining Wang
NLP Reseacher, Unisound
Natural Language ProcessingMachine Translation
Junnan Zhu
Junnan Zhu
Institute of Automation Chinese Academy of Sciences
Natural Language Processing