Training a Generally Curious Agent

📅 2025-02-24

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This work addresses the limited generalization capability of language agents in unseen environments for exploration and decision-making. We propose the PAPRIKA fine-tuning framework coupled with a high-potential-task-driven curriculum sampling strategy, integrating synthetic multi-task interaction data training, in-context adaptive reasoning, and curriculum-guided trajectory generation. This enables, for the first time, zero-shot cross-task decision transfer without gradient-based model updates. Empirical evaluation demonstrates significant improvements in exploration efficiency and task success rates on entirely unseen tasks, validating the agent’s autonomous adaptability; the primary bottleneck shifts from model parameter updates to sampling efficiency, highlighting system-level optimization as the critical frontier. Our core contribution is the design of the first language-agent architecture supporting real-time feedback-driven, zero-shot generalizable decision-making.

Technology Category

Application Category

📝 Abstract

Efficient exploration is essential for intelligent systems interacting with their environment, but existing language models often fall short in scenarios that require strategic information gathering. In this paper, we present PAPRIKA, a fine-tuning approach that enables language models to develop general decision-making capabilities that are not confined to particular environments. By training on synthetic interaction data from different tasks that require diverse strategies, PAPRIKA teaches models to explore and adapt their behavior on a new task based on environment feedback in-context without more gradient updates. Experimental results show that models fine-tuned with PAPRIKA can effectively transfer their learned decision-making capabilities to entirely unseen tasks without additional training. Unlike traditional training, our approach's primary bottleneck lies in sampling useful interaction data instead of model updates. To improve sample efficiency, we propose a curriculum learning strategy that prioritizes sampling trajectories from tasks with high learning potential. These results suggest a promising path towards AI systems that can autonomously solve novel sequential decision-making problems that require interactions with the external world.

Problem

Research questions and friction points this paper is trying to address.

Enhancing exploration in intelligent systems

Transferring decision-making across tasks

Improving sample efficiency in training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuning for general decision-making

Synthetic interaction data training

Curriculum learning for sample efficiency

🔎 Similar Papers

A Survey on Large Language Model based Autonomous Agents