AffordSim: A Scalable Data Generator and Benchmark for Affordance-Aware Robotic Manipulation

📅 2026-04-13
📈 Citations: 0
Influential: 0
📄 PDF

career value

196K/year
🤖 AI Summary
Existing simulation platforms lack semantic understanding of object affordances, hindering the generation of precise manipulation trajectories targeting specific functional regions such as cup handles or rims. This work proposes the first end-to-end simulation framework that integrates open-vocabulary 3D affordance prediction with robotic manipulation data generation. Our approach employs the VoxAfford model to predict multi-scale affordances directly from point clouds and leverages a vision-language model to drive task-aware behavior generation, guiding diverse robot arms in NVIDIA Isaac Sim to perform affordance-relevant actions. To enhance generalization, we introduce 3D Gaussian domain randomization grounded in real-world images. Evaluating across 50 tasks spanning seven categories, we find that current imitation learning methods achieve only 1–47% success on high-affordance-demanding tasks like pouring or hanging mugs, whereas our framework enables effective zero-shot sim-to-real transfer on the Franka FR3 robot.

Technology Category

Application Category

📝 Abstract
Simulation-based data generation has become a dominant paradigm for training robotic manipulation policies, yet existing platforms do not incorporate object affordance information into trajectory generation. As a result, tasks requiring precise interaction with specific functional regions--grasping a mug by its handle, pouring from a cup's rim, or hanging a mug on a hook--cannot be automatically generated with semantically correct trajectories. We introduce AffordSim, the first simulation framework that integrates open-vocabulary 3D affordance prediction into the manipulation data generation pipeline. AffordSim uses our VoxAfford model, an open-vocabulary 3D affordance detector that enhances MLLM output tokens with multi-scale geometric features, to predict affordance maps on object point clouds, guiding grasp pose estimation toward task-relevant functional regions. Built on NVIDIA Isaac Sim with cross-embodiment support (Franka FR3, Panda, UR5e, Kinova), VLM-powered task generation, and novel domain randomization using DA3-based 3D Gaussian reconstruction from real photographs, AffordSim enables automated, scalable generation of affordance-aware manipulation data. We establish a benchmark of 50 tasks across 7 categories (grasping, placing, stacking, pushing/pulling, pouring, mug hanging, long-horizon composite) and evaluate 4 imitation learning baselines (BC, Diffusion Policy, ACT, Pi 0.5). Our results reveal that while grasping is largely solved (53-93% success), affordance-demanding tasks such as pouring into narrow containers (1-43%) and mug hanging (0-47%) remain significantly more challenging for current imitation learning methods, highlighting the need for affordance-aware data generation. Zero-shot sim-to-real experiments on a real Franka FR3 validate the transferability of the generated data.
Problem

Research questions and friction points this paper is trying to address.

affordance-aware manipulation
simulation-based data generation
robotic manipulation
functional regions
trajectory generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

affordance-aware manipulation
open-vocabulary 3D affordance
simulation-based data generation
VoxAfford
domain randomization
M
Mingyang Li
School of Artificial Intelligence, Xi’an Jiaotong University
H
Haofan Xu
School of Artificial Intelligence, Xi’an Jiaotong University
Haowen Sun
Haowen Sun
Department of Automation, Tsinghua University
Computer Vision
X
Xinzhe Chen
School of Artificial Intelligence, Xi’an Jiaotong University
S
Sihua Ren
School of Artificial Intelligence, Xi’an Jiaotong University
L
Liqi Huang
School of Artificial Intelligence, Xi’an Jiaotong University
X
Xinyang Sui
School of Artificial Intelligence, Xi’an Jiaotong University
Chenyang Miao
Chenyang Miao
Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences
Reinforcement LearningRobot Learning
Q
Qiongjie Cui
School of Artificial Intelligence, Xi’an Jiaotong University
Z
Zeyang Liu
School of Artificial Intelligence, Xi’an Jiaotong University
X
Xingyu Chen
School of Artificial Intelligence, Xi’an Jiaotong University
X
Xuguang Lan
School of Artificial Intelligence, Xi’an Jiaotong University