🤖 AI Summary
This work addresses the challenge of evaluating the geometric robustness of robotic manipulation policies on deformable objects. We propose the first object-centric Geometric Red-Teaming (GRT) framework, which integrates a Jacobian-field deformation model with gradient-free closed-loop simulation optimization to automatically generate constraint-aware failure-inducing deformations—termed CrashShapes—that systematically expose vulnerability patterns overlooked by conventional benchmarks. Evaluated on insertion, articulation, and grasping tasks, CrashShapes reduce success rates from 90% to 22.5%, revealing critical geometric failure modes. Subsequent blue-team fine-tuning, guided by red-teaming insights, fully restores performance to 90%. To our knowledge, this is the first application of red-teaming principles to geometric robustness assessment in robotics. The framework establishes an interpretable, reproducible evaluation paradigm for trustworthy robotic manipulation and provides a principled pathway for robustness enhancement through adversarial stress testing.
📝 Abstract
Standard evaluation protocols in robotic manipulation typically assess policy performance over curated, in-distribution test sets, offering limited insight into how systems fail under plausible variation. We introduce Geometric Red-Teaming (GRT), a red-teaming framework that probes robustness through object-centric geometric perturbations, automatically generating CrashShapes -- structurally valid, user-constrained mesh deformations that trigger catastrophic failures in pre-trained manipulation policies. The method integrates a Jacobian field-based deformation model with a gradient-free, simulator-in-the-loop optimization strategy. Across insertion, articulation, and grasping tasks, GRT consistently discovers deformations that collapse policy performance, revealing brittle failure modes missed by static benchmarks. By combining task-level policy rollouts with constraint-aware shape exploration, we aim to build a general purpose framework for structured, object-centric robustness evaluation in robotic manipulation. We additionally show that fine-tuning on individual CrashShapes, a process we refer to as blue-teaming, improves task success by up to 60 percentage points on those shapes, while preserving performance on the original object, demonstrating the utility of red-teamed geometries for targeted policy refinement. Finally, we validate both red-teaming and blue-teaming results with a real robotic arm, observing that simulated CrashShapes reduce task success from 90% to as low as 22.5%, and that blue-teaming recovers performance to up to 90% on the corresponding real-world geometry -- closely matching simulation outcomes. Videos and code can be found on our project website: https://georedteam.github.io/ .