User Prompting Strategies and Prompt Enhancement Methods for Open-Set Object Detection in XR Environments

📅 2026-01-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited robustness of open-set object detection (OSOD) models in extended reality (XR) environments when confronted with user-generated prompts that are ambiguous, incomplete, or overly detailed. For the first time, it systematically evaluates the impact of four representative categories of user prompts on OSOD performance and introduces targeted prompt enhancement strategies. The approach integrates foundational models—GroundingDINO and YOLO-E—with vision-language models to simulate realistic user interactions. Experimental results demonstrate that the proposed method substantially improves model robustness under semantically ambiguous conditions, achieving a relative improvement of over 55% in mean Intersection over Union (mIoU) and a 41% increase in average confidence scores.

Technology Category

Application Category

📝 Abstract
Open-set object detection (OSOD) localizes objects while identifying and rejecting unknown classes at inference. While recent OSOD models perform well on benchmarks, their behavior under realistic user prompting remains underexplored. In interactive XR settings, user-generated prompts are often ambiguous, underspecified, or overly detailed. To study prompt-conditioned robustness, we evaluate two OSOD models, GroundingDINO and YOLO-E, on real-world XR images and simulate diverse user prompting behaviors using vision-language models. We consider four prompt types: standard, underdetailed, overdetailed, and pragmatically ambiguous, and examine the impact of two enhancement strategies on these prompts. Results show that both models exhibit stable performance under underdetailed and standard prompts, while they suffer degradation under ambiguous prompts. Overdetailed prompts primarily affect GroundingDINO. Prompt enhancement substantially improves robustness under ambiguity, yielding gains exceeding 55% mIoU and 41% average confidence. Based on the findings, we propose several prompting strategies and prompt enhancement methods for OSOD models in XR environments.
Problem

Research questions and friction points this paper is trying to address.

open-set object detection
user prompting
XR environments
prompt robustness
ambiguous prompts
Innovation

Methods, ideas, or system contributions that make the work stand out.

open-set object detection
user prompting strategies
prompt enhancement
XR environments
vision-language models
🔎 Similar Papers
No similar papers found.