Q-learning with temporal memory to navigate turbulence

📅 2024-04-26
📈 Citations: 1
Influential: 0
📄 PDF

career value

214K/year
🤖 AI Summary
This work addresses the challenge of odor source localization by spatially unaware agents relying solely on olfactory cues in turbulent odor plumes. We propose a discrete Q-learning algorithm augmented with temporal memory. Methodologically, we abstract high-dimensional continuous odor signals into a small set of discrete olfactory states and integrate a lightweight temporal memory module to distinguish in-plume versus out-of-plume states and model cross-wind surge–cast recovery behavior. Training leverages high-fidelity turbulent plume simulation and sparse reward reinforcement learning. Our key contributions are threefold: (i) we demonstrate for the first time that robust source localization is achievable using only a limited number of discrete olfactory states coupled with minimal temporal memory; (ii) the learned policy autonomously acquires insect-inspired cross-wind search and recovery strategies; and (iii) the policy exhibits strong generalization—requiring only minor fine-tuning to maintain high performance under environmental perturbations—thereby significantly enhancing both biological plausibility and engineering practicality of olfactory navigation.

Technology Category

Application Category

📝 Abstract
We consider the problem of olfactory searches in a turbulent environment. We focus on agents that respond solely to odor stimuli, with no access to spatial perception nor prior information about the odor. We ask whether navigation to a target can be learned robustly within a sequential decision making framework. We develop a reinforcement learning algorithm using a small set of interpretable olfactory states and train it with realistic turbulent odor cues. By introducing a temporal memory, we demonstrate that two salient features of odor traces, discretized in few olfactory states, are sufficient to learn navigation in a realistic odor plume. Performance is dictated by the sparse nature of turbulent odors. An optimal memory exists which ignores blanks within the plume and activates a recovery strategy outside the plume. We obtain the best performance by letting agents learn their recovery strategy and show that it is mostly casting cross wind, similar to behavior observed in flying insects. The optimal strategy is robust to substantial changes in the odor plumes, suggesting minor parameter tuning may be sufficient to adapt to different environments.
Problem

Research questions and friction points this paper is trying to address.

Olfactory Learning
Turbulent Wind Conditions
Optimization of Sourcing Paths
Innovation

Methods, ideas, or system contributions that make the work stand out.

Q-learning with memory
Optimal foraging strategy
Adaptive to dynamic olfactory environments
🔎 Similar Papers
No similar papers found.
M
Marco Rando
MaLGa, Department of computer science, bioengineering, robotics and systems engineering, University of Genova, Genova, Italy
M
Martin James
MalGa, Department of Civil, Chemical and Environmental Engineering, University of Genoa, Genoa, Italy
A
A. Verri
MaLGa, Department of computer science, bioengineering, robotics and systems engineering, University of Genova, Genova, Italy
L
L. Rosasco
MaLGa, Department of computer science, bioengineering, robotics and systems engineering, University of Genova, Genova, Italy
A
A. Seminara
MalGa, Department of Civil, Chemical and Environmental Engineering, University of Genoa, Genoa, Italy