🤖 AI Summary
Existing methods for joint detection of countable symbol instances and uncountable semantic regions—termed *panoptic symbol localization*—in CAD drawings suffer from rasterization artifacts, geometric information loss, and poor generalization. To address these issues, this work abandons image- or point-cloud-based representations and instead introduces the first vector-native paradigm, directly operating on原始 vector primitives (e.g., line segments) to preserve geometric continuity. We propose VecFormer, a novel transformer architecture with dedicated line-segment sequence encoding, and a branch-fusion refinement module that jointly models instance and semantic predictions in a unified framework. Evaluated on panoptic symbol recognition, our method achieves 91.1 PQ, establishing new state-of-the-art performance. Notably, Stuff-PQ improves by 9.6 and 21.2 points—with and without prior knowledge, respectively—demonstrating substantial gains in both accuracy and robustness.
📝 Abstract
We study the task of panoptic symbol spotting, which involves identifying both individual instances of countable things and the semantic regions of uncountable stuff in computer-aided design (CAD) drawings composed of vector graphical primitives. Existing methods typically rely on image rasterization, graph construction, or point-based representation, but these approaches often suffer from high computational costs, limited generality, and loss of geometric structural information. In this paper, we propose VecFormer, a novel method that addresses these challenges through line-based representation of primitives. This design preserves the geometric continuity of the original primitive, enabling more accurate shape representation while maintaining a computation-friendly structure, making it well-suited for vector graphic understanding tasks. To further enhance prediction reliability, we introduce a Branch Fusion Refinement module that effectively integrates instance and semantic predictions, resolving their inconsistencies for more coherent panoptic outputs. Extensive experiments demonstrate that our method establishes a new state-of-the-art, achieving 91.1 PQ, with Stuff-PQ improved by 9.6 and 21.2 points over the second-best results under settings with and without prior information, respectively, highlighting the strong potential of line-based representation as a foundation for vector graphic understanding.