🤖 AI Summary
Visual navigation in embodied AI faces a small-sample generalization bottleneck under long-horizon, multi-goal settings. Existing neural network approaches suffer from overfitting and poor robustness under data scarcity due to architectural complexity. This paper proposes a perception-optimization joint framework that, for the first time, integrates partially input-convex neural networks (PICNNs) with conformal calibration to construct interpretable convex uncertainty sets; it further formulates partially observable planning as a robust optimization problem. The resulting uncertainty-aware policy enables cross-environment transfer without environment-specific fine-tuning. Evaluated on both unordered and sequential multi-goal navigation tasks, our method achieves state-of-the-art performance, significantly improving generalization and robustness in unseen environments—particularly under limited training data.
📝 Abstract
Visual navigation is a fundamental problem in embodied AI, yet practical deployments demand long-horizon planning capabilities to address multi-objective tasks. A major bottleneck is data scarcity: policies learned from limited data often overfit and fail to generalize OOD. Existing neural network-based agents typically increase architectural complexity that paradoxically become counterproductive in the small-sample regime. This paper introduce NeuRO, a integrated learning-to-optimize framework that tightly couples perception networks with downstream task-level robust optimization. Specifically, NeuRO addresses core difficulties in this integration: (i) it transforms noisy visual predictions under data scarcity into convex uncertainty sets using Partially Input Convex Neural Networks (PICNNs) with conformal calibration, which directly parameterize the optimization constraints; and (ii) it reformulates planning under partial observability as a robust optimization problem, enabling uncertainty-aware policies that transfer across environments. Extensive experiments on both unordered and sequential multi-object navigation tasks demonstrate that NeuRO establishes SoTA performance, particularly in generalization to unseen environments. Our work thus presents a significant advancement for developing robust, generalizable autonomous agents.