AnyPos: Automated Task-Agnostic Actions for Bimanual Manipulation

📅 2025-07-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Vision-language-action (VLA) models for bimanual robot manipulation suffer from strong dependence on task-specific human demonstrations, poor generalization, and high data collection costs. Method: This paper proposes a task-agnostic action learning paradigm that decouples action execution from task semantics. We introduce ATARA, a self-supervised automated data collection framework integrating Arm-Decoupled Estimation and a direction-aware decoder, enhanced by video-conditioned action verification and inverse-dynamics modeling to improve safety and reliability. Furthermore, we design AnyPos, a model enabling cross-task zero-shot transfer. Contribution/Results: Experiments demonstrate 30–40% higher success rates on grasping, lifting, and clicking tasks, and a 51% improvement in test accuracy over human teleoperation-based baselines. The approach establishes a new paradigm for bimanual robotic learning with low data dependency and strong generalization capability.

Technology Category

Application Category

📝 Abstract
Vision-language-action (VLA) models have shown promise on task-conditioned control in complex settings such as bimanual manipulation. However, the heavy reliance on task-specific human demonstrations limits their generalization and incurs high data acquisition costs. In this work, we present a new notion of task-agnostic action paradigm that decouples action execution from task-specific conditioning, enhancing scalability, efficiency, and cost-effectiveness. To address the data collection challenges posed by this paradigm -- such as low coverage density, behavioral redundancy, and safety risks -- we introduce ATARA (Automated Task-Agnostic Random Actions), a scalable self-supervised framework that accelerates collection by over $ 30 imes $ compared to human teleoperation. To further enable effective learning from task-agnostic data, which often suffers from distribution mismatch and irrelevant trajectories, we propose AnyPos, an inverse dynamics model equipped with Arm-Decoupled Estimation and a Direction-Aware Decoder (DAD). We additionally integrate a video-conditioned action validation module to verify the feasibility of learned policies across diverse manipulation tasks. Extensive experiments show that the AnyPos-ATARA pipeline yields a 51% improvement in test accuracy and achieves 30-40% higher success rates in downstream tasks such as lifting, pick-and-place, and clicking, using replay-based video validation. Project Page: https://embodiedfoundation.github.io/vidar_anypos
Problem

Research questions and friction points this paper is trying to address.

Reducing reliance on task-specific human demonstrations
Addressing data collection challenges in task-agnostic actions
Improving learning from task-agnostic data distribution mismatch
Innovation

Methods, ideas, or system contributions that make the work stand out.

Task-agnostic action paradigm for bimanual manipulation
ATARA framework for scalable self-supervised data collection
AnyPos model with Arm-Decoupled Estimation and DAD
🔎 Similar Papers
No similar papers found.
Hengkai Tan
Hengkai Tan
Tsinghua University
Reinforcement LearningRobot LearningEmbodied AIDeep Generative Models
Y
Yao Feng
Dept. of Comp. Sci. and Tech., Institute for AI, BNRist Center, THBI Lab, Tsinghua-Bosch Joint ML Center, Tsinghua University
Xinyi Mao
Xinyi Mao
Undergraduate, Tsinghua University
RoboticsEmbodied AI
S
Shuhe Huang
Dept. of Comp. Sci. and Tech., Institute for AI, BNRist Center, THBI Lab, Tsinghua-Bosch Joint ML Center, Tsinghua University
Guodong Liu
Guodong Liu
Dept. of Comp. Sci. and Tech., Institute for AI, BNRist Center, THBI Lab, Tsinghua-Bosch Joint ML Center, Tsinghua University
Zhongkai Hao
Zhongkai Hao
Tsinghua University
machine learningAI for Sciencephysics-informed machine learning
H
Hang Su
Dept. of Comp. Sci. and Tech., Institute for AI, BNRist Center, THBI Lab, Tsinghua-Bosch Joint ML Center, Tsinghua University
J
Jun Zhu
Dept. of Comp. Sci. and Tech., Institute for AI, BNRist Center, THBI Lab, Tsinghua-Bosch Joint ML Center, Tsinghua University