IL-SOAR : Imitation Learning with Soft Optimistic Actor cRitic

📅 2025-02-27

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This work addresses the low sample efficiency and poor policy stability prevalent in expert demonstration–driven imitation learning. We propose SOAR, a dual-optimization–based policy learning framework that alternately optimizes a cost function and the policy, while incorporating multi-critic uncertainty estimation and a soft optimistic critic to enhance exploration. To our knowledge, SOAR is the first to integrate a soft optimistic critic into the dual-optimization framework for imitation learning, providing ε-optimality guarantees in tabular settings. Built upon Soft Actor-Critic, SOAR generalizes to multiple inverse reinforcement learning (IRL) paradigms—including f-IRL, ML-IRL, and CSIL. Empirical evaluation on benchmark MuJoCo tasks demonstrates that SOAR significantly improves both sample efficiency and training stability: it achieves comparable performance with 50% fewer environment interactions compared to baseline methods.

Technology Category

Application Category

📝 Abstract

This paper introduces the SOAR framework for imitation learning. SOAR is an algorithmic template that learns a policy from expert demonstrations with a primal dual style algorithm that alternates cost and policy updates. Within the policy updates, the SOAR framework uses an actor critic method with multiple critics to estimate the critic uncertainty and build an optimistic critic fundamental to drive exploration. When instantiated in the tabular setting, we get a provable algorithm with guarantees that matches the best known results in $epsilon$. Practically, the SOAR template is shown to boost consistently the performance of imitation learning algorithms based on Soft Actor Critic such as f-IRL, ML-IRL and CSIL in several MuJoCo environments. Overall, thanks to SOAR, the required number of episodes to achieve the same performance is reduced by half.

Problem

Research questions and friction points this paper is trying to address.

SOAR framework for imitation learning

primal dual style algorithm

reduces required episodes by half

Innovation

Methods, ideas, or system contributions that make the work stand out.

Imitation Learning with SOAR

Primal dual algorithm alternates

Actor critic with optimistic exploration

🔎 Similar Papers

Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL