CRAVES: Controlling Robotic Arm With a Vision-Based Economic System

📅 2018-12-03
🏛️ Computer Vision and Pattern Recognition
📈 Citations: 57
Influential: 2
📄 PDF
🤖 AI Summary
To address the challenge of achieving high-precision 3D pose estimation and closed-loop control for low-cost, sensorless robotic manipulators in real-world settings, this paper proposes a monocular vision–only end-to-end solution. Methodologically, we synthesize training data from target 3D models and employ domain-adaptive, geometry-constrained iterative semi-supervised learning to bypass manual annotation of real images; we further integrate deep reinforcement learning in simulation with physical-world deployment to enable cross-domain control transfer. Key contributions include: (1) the first vision-driven, economically inspired systematic control paradigm; (2) a novel semi-supervised pose estimation method requiring no real-image annotations; and (3) state-of-the-art pose accuracy on two real-world datasets, with successful deployment in physical grasping and manipulation tasks—demonstrating generalization to multi-rigid-body systems.
📝 Abstract
Training a robotic arm to accomplish real-world tasks has been attracting increasing attention in both academia and industry. This work discusses the role of computer vision algorithms in this field. We focus on low-cost arms on which no sensors are equipped and thus all decisions are made upon visual recognition, e.g., real-time 3D pose estimation. This requires annotating a lot of training data, which is not only time-consuming but also laborious. In this paper, we present an alternative solution, which uses a 3D model to create a large number of synthetic data, trains a vision model in this virtual domain, and applies it to real-world images after domain adaptation. To this end, we design a semi-supervised approach, which fully leverages the geometric constraints among keypoints. We apply an iterative algorithm for optimization. Without any annotations on real images, our algorithm generalizes well and produces satisfying results on 3D pose estimation, which is evaluated on two real-world datasets. We also construct a vision-based control system for task accomplishment, for which we train a reinforcement learning agent in a virtual environment and apply it to the real-world. Moreover, our approach, with merely a 3D model being required, has the potential to generalize to other types of multi-rigid-body dynamic systems.
Problem

Research questions and friction points this paper is trying to address.

Training robotic arms without sensors using vision algorithms
Reducing annotation effort for 3D pose estimation with synthetic data
Developing vision-based control for real-world task accomplishment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses 3D model for synthetic data generation
Applies semi-supervised domain adaptation technique
Trains RL agent in virtual environment