ADAM: Autonomous Discovery and Annotation Model using LLMs for Context-Aware Annotations

📅 2025-06-10

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Open-world object detection is constrained by predefined categories, hindering unsupervised discovery and labeling of unknown objects. To address this, we propose a training-free, self-correcting framework comprising three core components: (1) an Embedding-Label Repository (ELR) that enables cross-sample semantic alignment of category representations; (2) a vision-consistency-driven self-refinement loop integrating CLIP-based cross-modal retrieval, LLM-guided semantic generation, frequency-based voting, k-nearest-neighbor relabeling, and visual cohesion analysis; and (3) an end-to-end, zero-fine-tuning mechanism for novel class discovery and labeling. Evaluated on COCO and PASCAL VOC, our method significantly improves localization accuracy and labeling robustness for unseen categories. It establishes a scalable, training-free paradigm for open-world detection—eliminating reliance on labeled data or model adaptation while enabling reliable generalization to novel concepts.

Technology Category

Application Category

📝 Abstract

Object detection models typically rely on predefined categories, limiting their ability to identify novel objects in open-world scenarios. To overcome this constraint, we introduce ADAM: Autonomous Discovery and Annotation Model, a training-free, self-refining framework for open-world object labeling. ADAM leverages large language models (LLMs) to generate candidate labels for unknown objects based on contextual information from known entities within a scene. These labels are paired with visual embeddings from CLIP to construct an Embedding-Label Repository (ELR) that enables inference without category supervision. For a newly encountered unknown object, ADAM retrieves visually similar instances from the ELR and applies frequency-based voting and cross-modal re-ranking to assign a robust label. To further enhance consistency, we introduce a self-refinement loop that re-evaluates repository labels using visual cohesion analysis and k-nearest-neighbor-based majority re-labeling. Experimental results on the COCO and PASCAL datasets demonstrate that ADAM effectively annotates novel categories using only visual and contextual signals, without requiring any fine-tuning or retraining.

Problem

Research questions and friction points this paper is trying to address.

Overcoming predefined category limits in object detection

Generating context-aware labels for unknown objects using LLMs

Enhancing label consistency via self-refinement and visual cohesion

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs generate labels using contextual information

CLIP embeddings build unsupervised Embedding-Label Repository

Self-refinement loop improves label consistency

🔎 Similar Papers

Leveraging Large Language Models for Relevance Judgments in Legal Case Retrieval