Task-Relevant and Irrelevant Region-Aware Augmentation for Generalizable Vision-Based Imitation Learning in Agricultural Manipulation

📅 2026-03-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited generalization of visual imitation learning in agriculture, which stems from scarce demonstration data and significant visual domain discrepancies between crops and backgrounds. To overcome this, the authors propose DRAIL, a novel framework that introduces a dual-region augmentation mechanism distinguishing task-relevant from task-irrelevant regions. Task-relevant regions are enhanced with domain-informed, feature-preserving augmentations, while task-irrelevant regions undergo strong randomization to suppress distracting visual cues. This strategy effectively decouples essential task features from environmental noise and integrates with a diffusion-based policy for robust visuomotor control. Evaluated on simulated vegetable harvesting and real-world lettuce defect-leaf handling tasks, DRAIL significantly improves policy success rates, focuses attention on critical visual features, and demonstrates superior generalization and robustness.

Technology Category

Application Category

📝 Abstract
Vision-based imitation learning has shown promise for robotic manipulation; however, its generalization remains limited in practical agricultural tasks. This limitation stems from scarce demonstration data and substantial visual domain gaps caused by i) crop-specific appearance diversity and ii) background variations. To address this limitation, we propose Dual-Region Augmentation for Imitation Learning (DRAIL), a region-aware augmentation framework designed for generalizable vision-based imitation learning in agricultural manipulation. DRAIL explicitly separates visual observations into task-relevant and task-irrelevant regions. The task-relevant region is augmented in a domain-knowledge-driven manner to preserve essential visual characteristics, while the task-irrelevant region is aggressively randomized to suppress spurious background correlations. By jointly handling both sources of visual variation, DRAIL promotes learning policies that rely on task-essential features rather than incidental visual cues. We evaluate DRAIL on diffusion policy-based visuomotor controllers through robot experiments on artificial vegetable harvesting and real lettuce defective leaf picking preparation tasks. The results show consistent improvements in success rates under unseen visual conditions compared to baseline methods. Further attention analysis and representation generalization metrics indicate that the learned policies rely more on task-essential visual features, resulting in enhanced robustness and generalization.
Problem

Research questions and friction points this paper is trying to address.

generalization
vision-based imitation learning
agricultural manipulation
visual domain gap
demonstration scarcity
Innovation

Methods, ideas, or system contributions that make the work stand out.

region-aware augmentation
task-relevant region
visual generalization
imitation learning
agricultural robotics
🔎 Similar Papers
No similar papers found.
S
Shun Hattori
Division of Information Science, Graduate School of Information Science, Nara Institute of Science and Technology (NAIST), Nara, Japan
Hikaru Sasaki
Hikaru Sasaki
奈良先端科学技術大学院大学
T
Takumi Hachimine
Division of Information Science, Graduate School of Information Science, Nara Institute of Science and Technology (NAIST), Nara, Japan
Y
Yusuke Mizutani
DX·IT·Research & Development Center, TSUBAKIMOTO CHAIN CO., Kyoto, Japan
Takamitsu Matsubara
Takamitsu Matsubara
Nara Institute of Science and Technology
Robot LearningMachine LearningReinforcement LearningRobotics