🤖 AI Summary
This work addresses the challenge of enabling robots to simultaneously reason about high-level goals, material details, and efficient planning in long-horizon clay sculpting tasks. The authors formulate the problem as shape matching and introduce the first visual alignment representation that jointly incorporates lighting and texture cues for dynamic modeling and long-term planning with deformable objects. Their approach employs a parameterized pressing action space and a vision-aligned dynamics model, eliminating the need to retrain policies for each new target shape. Experiments across three deformable materials demonstrate performance on par with state-of-the-art methods, while achieving, for the first time, vision-guided, long-horizon robotic relief creation (>100 steps) compatible with multiple materials and end-effectors.
📝 Abstract
Clay sculpting is a nuanced, artistic task involving dexterous manipulation with long-horizon planning to achieve high-level goals. As a robotics problem, we formulate clay sculpting as a shape-to-shape matching challenge. Prior deformable object manipulation work either requires retraining a policy per goal or relies on dynamics models which represent state as sparse point clouds which do not capture important clay features, such as textures, well. We present a method for modeling the dynamics of deformable materials and planning for robotic sculpting in a representation that is visually-aligned, capturing lighting and texture features. With three different deformable materials and various end-effectors, we demonstrate that our dynamics model is comparable in performance to the state-of-the-art with the added benefit of being compatible with visual planning. Our actions are represented as parametrized pushes into clay with a single end-effector, which proved to be suitable for long-horizon (>100 actions) clay relief sculptures. Lastly, we show the benefits of planning in a visually-aligned representation, but also provide analysis providing evidence as to why this representation is challenging to plan in compared to 3D representations.