LLM-Guided Agentic Object Detection for Open-World Understanding

📅 2025-07-14

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Traditional object detection is constrained by predefined categories, limiting generalization to unknown objects. While open-world object detection (OWOD) and open-vocabulary object detection (OVOD) improve flexibility, OWOD lacks semantic labels for unknown classes, and OVOD relies on handcrafted prompts, compromising autonomy. This paper proposes LAOD, the first framework to decouple object localization from semantic naming: a large language model (LLM) autonomously generates scene-aware zero-shot class names, which are then used by an open-vocabulary detector for localization. To evaluate this paradigm, we introduce two novel metrics—Classification-Aware Average Precision (CAAP) for localization accuracy and Semantic Naming Average Precision (SNAP) for naming fidelity. Experiments on LVIS, COCO, and COCO-OOD demonstrate that LAOD significantly improves end-to-end detection and interpretable naming of unknown objects, enhancing autonomous adaptability in open-world environments.

Technology Category

Application Category

📝 Abstract

Object detection traditionally relies on fixed category sets, requiring costly re-training to handle novel objects. While Open-World and Open-Vocabulary Object Detection (OWOD and OVOD) improve flexibility, OWOD lacks semantic labels for unknowns, and OVOD depends on user prompts, limiting autonomy. We propose an LLM-guided agentic object detection (LAOD) framework that enables fully label-free, zero-shot detection by prompting a Large Language Model (LLM) to generate scene-specific object names. These are passed to an open-vocabulary detector for localization, allowing the system to adapt its goals dynamically. We introduce two new metrics, Class-Agnostic Average Precision (CAAP) and Semantic Naming Average Precision (SNAP), to separately evaluate localization and naming. Experiments on LVIS, COCO, and COCO-OOD validate our approach, showing strong performance in detecting and naming novel objects. Our method offers enhanced autonomy and adaptability for open-world understanding.

Problem

Research questions and friction points this paper is trying to address.

Enables label-free zero-shot object detection using LLMs

Overcomes limitations of fixed category sets in traditional detection

Introduces new metrics to evaluate localization and naming separately

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-guided label-free zero-shot object detection

Dynamic scene-specific object name generation

New metrics CAAP and SNAP for evaluation

🔎 Similar Papers

Discovering Object Attributes by Prompting Large Language Models with Perception-Action APIs