User Prompting Strategies and Prompt Enhancement Methods for Open-Set Object Detection in XR Environments

📅 2026-01-30

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This work addresses the limited robustness of open-set object detection (OSOD) models in extended reality (XR) environments when confronted with user-generated prompts that are ambiguous, incomplete, or overly detailed. For the first time, it systematically evaluates the impact of four representative categories of user prompts on OSOD performance and introduces targeted prompt enhancement strategies. The approach integrates foundational models—GroundingDINO and YOLO-E—with vision-language models to simulate realistic user interactions. Experimental results demonstrate that the proposed method substantially improves model robustness under semantically ambiguous conditions, achieving a relative improvement of over 55% in mean Intersection over Union (mIoU) and a 41% increase in average confidence scores.

Technology Category

Application Category

📝 Abstract

Open-set object detection (OSOD) localizes objects while identifying and rejecting unknown classes at inference. While recent OSOD models perform well on benchmarks, their behavior under realistic user prompting remains underexplored. In interactive XR settings, user-generated prompts are often ambiguous, underspecified, or overly detailed. To study prompt-conditioned robustness, we evaluate two OSOD models, GroundingDINO and YOLO-E, on real-world XR images and simulate diverse user prompting behaviors using vision-language models. We consider four prompt types: standard, underdetailed, overdetailed, and pragmatically ambiguous, and examine the impact of two enhancement strategies on these prompts. Results show that both models exhibit stable performance under underdetailed and standard prompts, while they suffer degradation under ambiguous prompts. Overdetailed prompts primarily affect GroundingDINO. Prompt enhancement substantially improves robustness under ambiguity, yielding gains exceeding 55% mIoU and 41% average confidence. Based on the findings, we propose several prompting strategies and prompt enhancement methods for OSOD models in XR environments.

Problem

Research questions and friction points this paper is trying to address.

open-set object detection

user prompting

XR environments

prompt robustness

ambiguous prompts

Innovation

Methods, ideas, or system contributions that make the work stand out.

open-set object detection

user prompting strategies

prompt enhancement