RoboView-Bias: Benchmarking Visual Bias in Embodied Agents for Robotic Manipulation

πŸ“… 2025-09-26
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing benchmarks lack systematic quantification of visual biases, hindering deep understanding of decision-making stability in embodied agents. To address this, we propose RoboView-Biasβ€”the first benchmark dedicated to evaluating visual bias in robotic manipulation. Grounded in the principle of factor isolation, it introduces a structured variant generation framework and a perception fairness verification protocol, enabling robust measurement of individual visual factors (e.g., viewpoint, color) and their interaction effects for the first time. Leveraging vision-language models (VLMs) for policy execution and semantic grounding layers for bias mitigation, we conduct comprehensive bias analysis and correction. Evaluated across 2,127 task instances, three state-of-the-art embodied agents exhibit significant performance degradation due to viewpoint and color preferences. Integrating semantic grounding reduces MOKA’s visual bias by 54.5%, demonstrating the efficacy of our approach in enhancing perceptual fairness and decision robustness.

Technology Category

Application Category

πŸ“ Abstract
The safety and reliability of embodied agents rely on accurate and unbiased visual perception. However, existing benchmarks mainly emphasize generalization and robustness under perturbations, while systematic quantification of visual bias remains scarce. This gap limits a deeper understanding of how perception influences decision-making stability. To address this issue, we propose RoboView-Bias, the first benchmark specifically designed to systematically quantify visual bias in robotic manipulation, following a principle of factor isolation. Leveraging a structured variant-generation framework and a perceptual-fairness validation protocol, we create 2,127 task instances that enable robust measurement of biases induced by individual visual factors and their interactions. Using this benchmark, we systematically evaluate three representative embodied agents across two prevailing paradigms and report three key findings: (i) all agents exhibit significant visual biases, with camera viewpoint being the most critical factor; (ii) agents achieve their highest success rates on highly saturated colors, indicating inherited visual preferences from underlying VLMs; and (iii) visual biases show strong, asymmetric coupling, with viewpoint strongly amplifying color-related bias. Finally, we demonstrate that a mitigation strategy based on a semantic grounding layer substantially reduces visual bias by approximately 54.5% on MOKA. Our results highlight that systematic analysis of visual bias is a prerequisite for developing safe and reliable general-purpose embodied agents.
Problem

Research questions and friction points this paper is trying to address.

Quantifying visual bias in robotic manipulation through systematic benchmarking
Evaluating how visual factors affect decision-making stability in embodied agents
Developing mitigation strategies to reduce visual bias for safer robotic systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematically quantifies visual bias via factor isolation
Uses structured variant-generation framework for bias measurement
Mitigates bias with semantic grounding layer technique
πŸ”Ž Similar Papers
No similar papers found.
E
Enguang Liu
School of Cyber Science and Engineering, Nanjing University of Science and Technology
Siyuan Liang
Siyuan Liang
College of Computing and Data Science, Nanyang Technological University
Trustworthy Foundation Model
L
Liming Lu
School of Cyber Science and Engineering, Nanjing University of Science and Technology
X
Xiyu Zeng
School of Cyber Science and Engineering, Nanjing University of Science and Technology
Xiaochun Cao
Xiaochun Cao
Sun Yat-sen University
Computer VisionArtificial IntelligenceMultimediaMachine Learning
A
Aishan Liu
Beihang University
Shuchao Pang
Shuchao Pang
University of New South Wales
Medical image analysisdeep learning