Label Anything: An Interpretable, High-Fidelity and Prompt-Free Annotator

πŸ“… 2025-02-05
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the high annotation cost and data bottlenecks in autonomous driving scene semantic understanding, this paper proposes LAMβ€”a prompt-free, interpretable, and high-fidelity automatic labeling model. Methodologically, LAM leverages a single pre-annotated RGB image for initialization, extracts ViT-based features, and fuses them via a novel Semantic Class Adapter (SCA) to align cross-scene semantic representations with minimal parameters. It further introduces Optimization-Oriented Unfolding (OptOU), a multi-stage cascaded architecture wherein each stage is governed by an explicit, differentiable optimization objective, enabling end-to-end zero-shot generalization. Contributions include: (1) the first SCA for efficient, parameter-light semantic alignment; and (2) OptOU for interpretable, gradient-based label propagation. Evaluated on CamVid, Cityscapes, ApolloScapes, and CARLA, LAM achieves near-perfect mIoU (~100%), substantially improving labeling efficiency, fidelity, and robustness while drastically reducing reliance on manual annotation.

Technology Category

Application Category

πŸ“ Abstract
Learning-based street scene semantic understanding in autonomous driving (AD) has advanced significantly recently, but the performance of the AD model is heavily dependent on the quantity and quality of the annotated training data. However, traditional manual labeling involves high cost to annotate the vast amount of required data for training robust model. To mitigate this cost of manual labeling, we propose a Label Anything Model (denoted as LAM), serving as an interpretable, high-fidelity, and prompt-free data annotator. Specifically, we firstly incorporate a pretrained Vision Transformer (ViT) to extract the latent features. On top of ViT, we propose a semantic class adapter (SCA) and an optimization-oriented unrolling algorithm (OptOU), both with a quite small number of trainable parameters. SCA is proposed to fuse ViT-extracted features to consolidate the basis of the subsequent automatic annotation. OptOU consists of multiple cascading layers and each layer contains an optimization formulation to align its output with the ground truth as closely as possible, though which OptOU acts as being interpretable rather than learning-based blackbox nature. In addition, training SCA and OptOU requires only a single pre-annotated RGB seed image, owing to their small volume of learnable parameters. Extensive experiments clearly demonstrate that the proposed LAM can generate high-fidelity annotations (almost 100% in mIoU) for multiple real-world datasets (i.e., Camvid, Cityscapes, and Apolloscapes) and CARLA simulation dataset.
Problem

Research questions and friction points this paper is trying to address.

Reduces manual labeling costs
Enhances annotation fidelity
Improves interpretability of annotations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Vision Transformer for feature extraction
Implements Semantic Class Adapter
Applies Optimization-Oriented Unrolling algorithm
πŸ”Ž Similar Papers
No similar papers found.
W
Wei-Bin Kou
Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, China; Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, China; Shenzhen International Center For Industrial And Applied Mathematics, Shenzhen Research Institute of Big Data, Shenzhen, China
G
Guangxu Zhu
Shenzhen International Center For Industrial And Applied Mathematics, Shenzhen Research Institute of Big Data, Shenzhen, China
Rongguang Ye
Rongguang Ye
Southern University of Science and Technology
S
Shuai Wang
Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
M
Ming Tang
Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, China
Y
Yik-Chung Wu
Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, China