Spatial Language Likelihood Grounding Network for Bayesian Fusion of Human-Robot Observations

📅 2025-07-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In human-robot collaboration, integrating human spatial language with robot sensor data remains challenging due to semantic–perceptual misalignment and high input uncertainty. To address this, we propose the Feature Pyramid Likelihood Grounding Network (FP-LGN), a novel architecture that jointly models visual features and spatial semantic relations via three-stage curriculum learning. FP-LGN is the first method to probabilistically estimate and Bayesian-fuse *epistemic uncertainty* inherent in spatial language grounding. It innovatively integrates feature pyramid representations, spatial relation embeddings, and deep probabilistic modeling, enabling end-to-end training on both map imagery and natural language inputs. Experiments demonstrate that FP-LGN achieves expert-rule-level performance in negative log-likelihood while reducing standard deviation significantly—indicating improved calibration. Moreover, it substantially enhances robustness of perception–language fusion and downstream task performance.

Technology Category

Application Category

📝 Abstract
Fusing information from human observations can help robots overcome sensing limitations in collaborative tasks. However, an uncertainty-aware fusion framework requires a grounded likelihood representing the uncertainty of human inputs. This paper presents a Feature Pyramid Likelihood Grounding Network (FP-LGN) that grounds spatial language by learning relevant map image features and their relationships with spatial relation semantics. The model is trained as a probability estimator to capture aleatoric uncertainty in human language using three-stage curriculum learning. Results showed that FP-LGN matched expert-designed rules in mean Negative Log-Likelihood (NLL) and demonstrated greater robustness with lower standard deviation. Collaborative sensing results demonstrated that the grounded likelihood successfully enabled uncertainty-aware fusion of heterogeneous human language observations and robot sensor measurements, achieving significant improvements in human-robot collaborative task performance.
Problem

Research questions and friction points this paper is trying to address.

Fusing human observations to overcome robot sensing limitations
Grounding spatial language with uncertainty-aware likelihood estimation
Improving human-robot task performance via heterogeneous data fusion
Innovation

Methods, ideas, or system contributions that make the work stand out.

FP-LGN grounds spatial language via map features
Three-stage curriculum learning captures language uncertainty
Enables uncertainty-aware fusion for human-robot collaboration
S
Supawich Sitdhipol
Autonomous Systems Lab, Dept. of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Thailand
W
Waritwong Sukprasongdee
Autonomous Systems Lab, Dept. of Mechanical Engineering, Faculty of Engineering, Chulalongkorn University, Thailand
Ekapol Chuangsuwanich
Ekapol Chuangsuwanich
Chulalongkorn University
Speech ProcessingNatural Language ProcessingMedical AI
R
Rina Tse
Dept. of Mechanical Engineering, Faculty of Engineering, Chulalongkorn University, Thailand