🤖 AI Summary
This work addresses crop damage monitoring for precision agriculture, focusing on active exploration and sequential sensing by multiple agents in partially observable grid-based farmland. To preserve Markovian properties under partial observability, we propose a belief update mechanism incorporating a POV-based visibility mask. We design a training-free, entropy-driven information gain (IG) strategy for uncertainty-aware efficient exploration. Furthermore, we introduce a dual-CNN-DQN architecture that integrates LSTM-based belief modeling with probabilistic map inference, balancing interpretability and decision efficiency. Evaluated on a 20×20 simulated farmland, our approach significantly improves exploration efficiency and generalization to large-scale environments. The training-free IG strategy demonstrates robust performance across scenarios, empirically validating the effectiveness of uncertainty-guided exploration as a paradigm for agricultural monitoring.
📝 Abstract
Precision agriculture requires efficient autonomous systems for crop monitoring, where agents must explore large-scale environments while minimizing resource consumption. This work addresses the problem as an active exploration task in a grid environment representing an agricultural field. Each cell may contain targets (e.g., damaged crops) observable from nine predefined points of view (POVs). Agents must infer the number of targets per cell using partial, sequential observations. We propose a two-stage deep learning framework. A pre-trained LSTM serves as a belief model, updating a probabilistic map of the environment and its associated entropy, which defines the expected information gain (IG). This allows agents to prioritize informative regions. A key contribution is the inclusion of a POV visibility mask in the input, preserving the Markov property under partial observability and avoiding revisits to already explored views. Three agent architectures were compared: an untrained IG-based agent selecting actions to maximize entropy reduction; a DQN agent using CNNs over local 3x3 inputs with belief, entropy, and POV mask; and a Double-CNN DQN agent with wider spatial context. Simulations on 20x20 maps showed that the untrained agent performs well despite its simplicity. The DQN agent matches this performance when the POV mask is included, while the Double-CNN agent consistently achieves superior exploration efficiency, especially in larger environments. Results show that uncertainty-aware policies leveraging entropy, belief states, and visibility tracking lead to robust and scalable exploration. Future work includes curriculum learning, multi-agent cooperation with shared rewards, transformer-based models, and intrinsic motivation mechanisms to further enhance learning efficiency and policy generalization.