LRANet++: Low-Rank Approximation Network for Accurate and Efficient Text Spotting

📅 2025-11-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of balancing accuracy and efficiency in end-to-end arbitrary-shape text spotting, this paper proposes an efficient and robust spotting framework. Methodologically: (1) it introduces a data-driven low-rank subspace model for text geometry, coupled with ℓ₁-norm optimization to achieve noise-robust shape recovery; (2) it designs a parameterized text shape representation and a triple-assignment detection head, jointly leveraging deep-sparse, light-sparse, and dense branches to ensure both training stability and fast inference; (3) it incorporates a lightweight recognition branch to reduce overall computational overhead. Evaluated on multiple arbitrary-shape text benchmarks—including CTW1500 and Total-Text—the method achieves state-of-the-art performance, delivering superior accuracy–latency trade-offs, particularly on curved and irregular text instances.

Technology Category

Application Category

📝 Abstract
End-to-end text spotting aims to jointly optimize text detection and recognition within a unified framework. Despite significant progress, designing an accurate and efficient end-to-end text spotter for arbitrary-shaped text remains largely unsolved. We identify the primary bottleneck as the lack of a reliable and efficient text detection method. To address this, we propose a novel parameterized text shape method based on low-rank approximation for precise detection and a triple assignment detection head to enable fast inference. Specifically, unlike other shape representation methods that employ data-irrelevant parameterization, our data-driven approach derives a low-rank subspace directly from labeled text boundaries. To ensure this process is robust against the inherent annotation noise in this data, we utilize a specialized recovery method based on an $ell_1$-norm formulation, which accurately reconstructs the text shape with only a few key orthogonal vectors. By exploiting the inherent shape correlation among different text contours, our method achieves consistency and compactness in shape representation. Next, the triple assignment scheme introduces a novel architecture where a deep sparse branch (for stabilized training) is used to guide the learning of an ultra-lightweight sparse branch (for accelerated inference), while a dense branch provides rich parallel supervision. Building upon these advancements, we integrate the enhanced detection module with a lightweight recognition branch to form an end-to-end text spotting framework, termed LRANet++, capable of accurately and efficiently spotting arbitrary-shaped text. Extensive experiments on several challenging benchmarks demonstrate the superiority of LRANet++ compared to state-of-the-art methods. Code will be available at: https://github.com/ychensu/LRANet-PP.git
Problem

Research questions and friction points this paper is trying to address.

Develops efficient arbitrary-shaped text detection using low-rank approximation
Addresses annotation noise robustness through specialized shape reconstruction method
Integrates lightweight recognition with enhanced detection for end-to-end text spotting
Innovation

Methods, ideas, or system contributions that make the work stand out.

Low-rank approximation for precise text shape detection
Triple assignment head for fast and stable inference
Lightweight recognition branch integrated for end-to-end spotting
🔎 Similar Papers
No similar papers found.
Y
Yuchen Su
College of Computer Science and Artificial Intelligence, Fudan University, Shanghai 200433, China
Zhineng Chen
Zhineng Chen
Institute of Trustworthy Embodied AI, Fudan University
Computer VisionOCRMultimedia Analysis
Yongkun Du
Yongkun Du
复旦大学
Computer VisionOCR
Zuxuan Wu
Zuxuan Wu
Fudan University
H
Hongtao Xie
School of Information Science and Technology, University of Science and Technology of China, Hefei 230022, China
Yu-Gang Jiang
Yu-Gang Jiang
Professor, Fudan University. IEEE & IAPR Fellow
Video AnalysisEmbodied AITrustworthy AI