🤖 AI Summary
This work addresses three key challenges in multilingual document table structure recognition: scarcity of annotated data, reliance on OCR preprocessing, and language-specific modeling assumptions. We propose the first script-agnostic end-to-end table structure recognition paradigm, eliminating both OCR dependency and script-specific priors. Our method employs a geometric-topological joint modeling framework based on graph neural networks, integrating edge detection, connected-component analysis, and hierarchical cell clustering to jointly detect table wireframes and resolve cell relationships—regardless of script, font, or layout. Evaluated on a multilingual table dataset covering 12 scripts—including Latin, Chinese, Arabic, and Devanagari—our approach achieves a mean F1 score of 92.4%, substantially outperforming existing state-of-the-art methods. It further demonstrates strong zero-shot script transfer capability and robust generalization across diverse linguistic and typographic settings.