ADAM: Autonomous Discovery and Annotation Model using LLMs for Context-Aware Annotations

📅 2025-06-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Open-world object detection is constrained by predefined categories, hindering unsupervised discovery and labeling of unknown objects. To address this, we propose a training-free, self-correcting framework comprising three core components: (1) an Embedding-Label Repository (ELR) that enables cross-sample semantic alignment of category representations; (2) a vision-consistency-driven self-refinement loop integrating CLIP-based cross-modal retrieval, LLM-guided semantic generation, frequency-based voting, k-nearest-neighbor relabeling, and visual cohesion analysis; and (3) an end-to-end, zero-fine-tuning mechanism for novel class discovery and labeling. Evaluated on COCO and PASCAL VOC, our method significantly improves localization accuracy and labeling robustness for unseen categories. It establishes a scalable, training-free paradigm for open-world detection—eliminating reliance on labeled data or model adaptation while enabling reliable generalization to novel concepts.

Technology Category

Application Category

📝 Abstract
Object detection models typically rely on predefined categories, limiting their ability to identify novel objects in open-world scenarios. To overcome this constraint, we introduce ADAM: Autonomous Discovery and Annotation Model, a training-free, self-refining framework for open-world object labeling. ADAM leverages large language models (LLMs) to generate candidate labels for unknown objects based on contextual information from known entities within a scene. These labels are paired with visual embeddings from CLIP to construct an Embedding-Label Repository (ELR) that enables inference without category supervision. For a newly encountered unknown object, ADAM retrieves visually similar instances from the ELR and applies frequency-based voting and cross-modal re-ranking to assign a robust label. To further enhance consistency, we introduce a self-refinement loop that re-evaluates repository labels using visual cohesion analysis and k-nearest-neighbor-based majority re-labeling. Experimental results on the COCO and PASCAL datasets demonstrate that ADAM effectively annotates novel categories using only visual and contextual signals, without requiring any fine-tuning or retraining.
Problem

Research questions and friction points this paper is trying to address.

Overcoming predefined category limits in object detection
Generating context-aware labels for unknown objects using LLMs
Enhancing label consistency via self-refinement and visual cohesion
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs generate labels using contextual information
CLIP embeddings build unsupervised Embedding-Label Repository
Self-refinement loop improves label consistency
🔎 Similar Papers
No similar papers found.
Amirreza Rouhi
Amirreza Rouhi
School of Science and Technology, Nottingham Trent University
Direct Numerical SimulationLarge Eddy SimulationTurbulence
S
Solmaz Arezoomandan
Department of Electrical and Computer Engineering, Drexel University
K
Knut Peterson
Department of Electrical and Computer Engineering, Drexel University
J
Joseph T. Woods
Department of Electrical and Computer Engineering, Drexel University
D
David K. Han
Department of Electrical and Computer Engineering, Drexel University