🤖 AI Summary
This work addresses the challenge of inefficient exploration and susceptibility to local optima in reinforcement learning when applied to high-dimensional state spaces and long-horizon tasks, particularly under sparse or fixed reward conditions. To overcome these limitations, the authors propose a novel adaptive reward shaping method grounded in fuzzy logic, which—unlike prior approaches—integrates human prior knowledge into the reward design through interpretable fuzzy rules that dynamically modulate the reward signal. The proposed framework requires minimal hyperparameter tuning and offers both interpretability and robustness, effectively balancing high-speed maneuvering with precise control. Evaluated on a benchmark autonomous drone racing task, the method demonstrates significantly accelerated convergence and enhanced stability, achieving up to a 5% improvement in task success rate.
📝 Abstract
Reinforcement learning (RL) often struggles in real-world tasks with high-dimensional state spaces and long horizons, where sparse or fixed rewards severely slow down exploration and cause agents to get trapped in local optima. This paper presents a fuzzy logic based reward shaping method that integrates human intuition into RL reward design. By encoding expert knowledge into adaptive and interpreable terms, fuzzy rules promote stable learning and reduce sensitivity to hyperparameters. The proposed method leverages these properties to adapt reward contributions based on the agent state, enabling smoother transitions between fast motion and precise control in challenging navigation tasks. Extensive simulation results on autonomous drone racing benchmarks show stable learning behavior and consistent task performance across scenarios of increasing difficulty. The proposed method achieves faster convergence and reduced performance variability across training seeds in more challenging environments, with success rates improving by up to approximately 5 percent compared to non fuzzy reward formulations.