🤖 AI Summary
This work addresses the challenges of motion coordination, singularity avoidance, and safe exploration in a high-precision peg-in-hole task performed collaboratively by a Delta and a 3-RRS parallel robot. The authors propose a joint optimization framework integrating kinematic design and deep reinforcement learning. First, the singularity-free workspace of the 3-RRS mechanism is expanded through geometric configuration optimization. Within its six-degree-of-freedom controllable subspace, a Rainbow DQN algorithm—incorporating double Q-learning, prioritized experience replay, and multi-step returns—is employed alongside a two-stage curriculum training scheme and a tailored reward function to learn insertion policies on a five-dimensional task manifold. The proposed approach significantly enhances policy convergence and safety, achieving stable and reliable insertions in high-fidelity simulation while substantially reducing constraint violations and improving success rates compared to baseline methods.
📝 Abstract
This paper presents a kinematics-aware deep reinforcement learning framework based on Rainbow Deep Q-Networks (DQN) for cooperative peg-in-hole manipulation by a Delta parallel robot and a 3-RRS (Revolute--Revolute--Spherical) parallel manipulator. A key contribution is the integration of a geometric design-optimization stage that precedes learning: the 3-RRS geometry is tuned to maximize the singularity-free workspace and improve conditioning, which in turn enlarges the safe region in which the reinforcement learning policy can explore. Together the two manipulators expose a 6~degree-of-freedom (DoF) controllable subspace (three Delta translations, two 3-RRS rotations, and one 3-RRS vertical translation); the peg-in-hole task is invariant to rotation about the peg axis, so the task-relevant manifold is five dimensional. The cooperative insertion problem is cast as a Markov Decision Process with a 12-dimensional state vector and a discrete action set containing $6 \times 2 = 12$ incremental commands (one positive and one negative per controlled DoF). A shaped reward combines dense proximity guidance, penalties for kinematic and workspace violations, and sparse bonuses for successful insertions. The Rainbow DQN -- integrating double Q-learning, dueling architecture, prioritized replay, multi-step returns, noisy linear layers for exploration, and a distributional value head -- is trained with a two-stage curriculum. The co-designed framework is validated in a high-fidelity kinematic simulator, where it achieves stable policy convergence, reliable insertions, and reduced constraint violations compared against a vanilla DQN agent and a classical sampling-based planner.