ARC-RL: A Reinforcement Learning Playground Inspired by ARC Raiders

📅 2026-05-19

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This work addresses the limitations of existing reinforcement learning approaches in legged locomotion control, which typically rely on realistic robot morphologies and handcrafted, multi-component reward functions, rendering them ill-suited for stylized, non-realistic creature designs such as those found in game NPCs. To bridge this gap, the authors introduce four MuJoCo-based continuous control environments inspired by *ARC Raiders*, featuring standardized observation, action, and reward structures, thereby establishing the first benchmark that incorporates stylized, non-realistic morphologies into reinforcement learning. The framework employs a universal closed-form reward function and provides demonstration data generated via Central Pattern Generators (CPGs), enabling systematic evaluation of both online and offline-to-online algorithms—including SAC, SPEQ, and SOPE. Experiments demonstrate that incorporating prior demonstration data substantially improves policy learning efficiency and stability, validating the framework’s effectiveness under diverse morphological and stylistic constraints.

📝 Abstract

Reinforcement learning for legged locomotion has matured into a stack of multi-component reward functions and physics-engine benchmarks whose morphologies are uniformly derived from real commercial hardware. Game NPCs, however, are bound by stylistic constraints absent from sim-to-real robotics and routinely take the form of creatures with no real-robot counterpart. We introduce ARC-RL, a suite of four MuJoCo continuous-control environments featuring robotic morphologies inspired by the bestiary of ARC Raiders: the 18-DoF tall hexapod Queen, the 12-DoF armoured hexapod Bastion, the 18-DoF compact hexapod Tick, and the 12-DoF quadruped Leaper. All four robots share a unified observation template, action convention, simulation cadence, and a single closed-form multi-component reward function whose only per-morphology variation lives in a small set of weights and parameters. The reward fuses a velocity-tracking tent, a healthy survive bonus, a phase-locked gait-compliance bonus/cost pair, action regularisers, three safety penalties, and a posture anchor; no motion-capture data enters the reward at any point. We additionally provide hand-crafted Central Pattern Generator demonstrators per morphology, which serve both as fixed expert references and as sources of prior data for offline-to-online training. On this playground, we conduct a controlled empirical study comparing standard online algorithms (SAC, SPEQ, SOPE-EO) and methods augmented with prior data (SACfD, SPEQ-O2O, SOPE), and characterise how each paradigm copes with the playground's morphological diversity and animation-style stylistic constraints.

Problem

Research questions and friction points this paper is trying to address.

legged locomotion

morphological diversity

stylistic constraints

reinforcement learning

game NPCs

Innovation

Methods, ideas, or system contributions that make the work stand out.

ARC-RL

morphological diversity

closed-form reward function