🤖 AI Summary
Predicting generalizable 6-DoF placement poses for diverse geometric objects under complex configurations (e.g., insertion, stacking, hanging) remains challenging in robotic manipulation.
Method: We propose the first generalized placement model trained exclusively on synthetic data, using a two-stage cascaded architecture: (1) a vision-language model (VLM) to localize coarse feasible regions, followed by (2) a local pose regression network to predict high-accuracy, physically valid 6-DoF poses.
Contribution/Results: Our key innovation is a VLM-guided region-focusing mechanism that efficiently models dense, multimodal, shape-agnostic feasible pose distributions. In simulation, the method achieves state-of-the-art performance—supporting broader placement modalities and yielding higher pose accuracy. Crucially, without any fine-tuning, it successfully deploys on real robots to perform delicate tasks such as precision insertion and multi-object stacking, demonstrating significantly improved cross-geometry and cross-task generalization.
📝 Abstract
Object placement in robotic tasks is inherently challenging due to the diversity of object geometries and placement configurations. To address this, we propose AnyPlace, a two-stage method trained entirely on synthetic data, capable of predicting a wide range of feasible placement poses for real-world tasks. Our key insight is that by leveraging a Vision-Language Model (VLM) to identify rough placement locations, we focus only on the relevant regions for local placement, which enables us to train the low-level placement-pose-prediction model to capture diverse placements efficiently. For training, we generate a fully synthetic dataset of randomly generated objects in different placement configurations (insertion, stacking, hanging) and train local placement-prediction models. We conduct extensive evaluations in simulation, demonstrating that our method outperforms baselines in terms of success rate, coverage of possible placement modes, and precision. In real-world experiments, we show how our approach directly transfers models trained purely on synthetic data to the real world, where it successfully performs placements in scenarios where other models struggle -- such as with varying object geometries, diverse placement modes, and achieving high precision for fine placement. More at: https://any-place.github.io.