Cost-Aware Diffusion Active Search

📅 2026-02-23

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

This work addresses the challenge of enabling efficient online adaptive search for agents in environments with an unknown number of targets, while balancing exploration, exploitation, and decision-making costs. It introduces diffusion models to active search tasks for the first time, leveraging their sequence modeling capability to directly sample forward-looking action sequences—thereby circumventing the need for computationally expensive search trees and supporting both single-agent and multi-agent collaborative decision-making. To mitigate the optimistic bias inherent in existing diffusion-based reinforcement learning approaches for non-myopic planning, the authors propose a correction mechanism and design a cost-aware, efficient search strategy. Experiments demonstrate that the proposed method achieves higher target recovery rates than standard offline reinforcement learning baselines and offers substantially improved computational efficiency compared to conventional tree-search methods.

Technology Category

Application Category

📝 Abstract

Active search for recovering objects of interest through online, adaptive decision making with autonomous agents requires trading off exploration of unknown environments with exploitation of prior observations in the search space. Prior work has proposed information gain and Thompson sampling based myopic, greedy approaches for agents to actively decide query or search locations when the number of targets is unknown. Decision making algorithms in such partially observable environments have also shown that agents capable of lookahead over a finite horizon outperform myopic policies for active search. Unfortunately, lookahead algorithms typically rely on building a computationally expensive search tree that is simulated and updated based on the agent's observations and a model of the environment dynamics. Instead, in this work, we leverage the sequence modeling abilities of diffusion models to sample lookahead action sequences that balance the exploration-exploitation trade-off for active search without building an exhaustive search tree. We identify the optimism bias in prior diffusion based reinforcement learning approaches when applied to the active search setting and propose mitigating solutions for efficient cost-aware decision making with both single and multi-agent teams. Our proposed algorithm outperforms standard baselines in offline reinforcement learning in terms of full recovery rate and is computationally more efficient than tree search in cost-aware active decision making.

Problem

Research questions and friction points this paper is trying to address.

active search

exploration-exploitation trade-off

cost-aware decision making

partially observable environments

multi-agent systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion models

active search

cost-aware decision making