AeroPlace-Flow: Language-Grounded Object Placement for Aerial Manipulators via Visual Foresight and Object Flow

📅 2026-03-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitation of existing aerial manipulation methods, which rely on predefined coordinates and struggle to flexibly specify placement targets through natural language. The authors propose a training-free framework that, for the first time, integrates language guidance, image editing, and 3D object-centric flow to generate executable placement actions solely from RGB-D observations and natural language instructions. By combining depth alignment, object-centric 3D geometric reasoning, and collision-aware object flow, the method achieves an average success rate of 75% in both simulated and real-world robotic experiments, significantly enhancing the flexibility and reliability of language-conditioned object placement across diverse aerial scenarios.

Technology Category

Application Category

📝 Abstract
Precise object placement remains underexplored in aerial manipulation, where most systems rely on predefined target coordinates and focus primarily on grasping and control. Specifying exact placement poses, however, is cumbersome in real-world settings, where users naturally communicate goals through language. In this work, we present AeroPlace-Flow, a training-free framework for language-grounded aerial object placement that unifies visual foresight with explicit 3D geometric reasoning and object flow. Given RGB-D observations of the object and the placement scene, along with a natural language instruction, AeroPlace-Flow first synthesizes a task-complete goal image using image editing models. The imagined configuration is then grounded into metric 3D space through depth alignment and object-centric reasoning, enabling the inference of a collision-aware object flow that transports the grasped object to a language and contact-consistent placement configuration. The resulting motion is executed via standard trajectory tracking for an aerial manipulator. AeroPlace-Flow produces executable placement targets without requiring predefined poses or task-specific training. We validate our approach through extensive simulation and real-world experiments, demonstrating reliable language-conditioned placement across diverse aerial scenarios with an average success rate of 75% on hardware.
Problem

Research questions and friction points this paper is trying to address.

aerial manipulation
object placement
language grounding
visual foresight
object flow
Innovation

Methods, ideas, or system contributions that make the work stand out.

language-grounded placement
visual foresight
object flow
aerial manipulation
3D geometric reasoning
🔎 Similar Papers
No similar papers found.