🤖 AI Summary
This study addresses modeling bottlenecks of reinforcement learning (RL) in complex 3D visual spatial reasoning tasks—specifically, Same-Different judgment. Traditional RL methods suffer from poor convergence and limited generalization in unstructured 3D environments. To overcome this, we propose the first curriculum learning framework grounded in empirical human cognitive data: a progressively structured task sequence derived directly from human behavioral performance, trained via phased integration of Proximal Policy Optimization (PPO), behavioral cloning, and imitation learning. Our key contribution lies in leveraging human experimental results to explicitly guide and interpret agent policy learning. Experiments demonstrate substantial improvements in training convergence speed and cross-task generalization; notably, our approach achieves the first stable learning performance on 3D Same-Different tasks and replicates strategy patterns closely aligned with human behavior.
📝 Abstract
Reinforcement Learning is a mature technology, often suggested as a potential route towards Artificial General Intelligence, with the ambitious goal of replicating the wide range of abilities found in natural and artificial intelligence, including the complexities of human cognition. While RL had shown successes in relatively constrained environments, such as the classic Atari games and specific continuous control problems, recent years have seen efforts to expand its applicability. This work investigates the potential of RL in demonstrating intelligent behaviour and its progress in addressing more complex and less structured problem domains.
We present an investigation into the capacity of modern RL frameworks in addressing a seemingly straightforward 3D Same-Different visuospatial task. While initial applications of state-of-the-art methods, including PPO, behavioural cloning and imitation learning, revealed challenges in directly learning optimal strategies, the successful implementation of curriculum learning offers a promising avenue. Effective learning was achieved by strategically designing the lesson plan based on the findings of a real-world human experiment.