Enabling Self-Improving Agents to Learn at Test Time With Human-In-The-Loop Guidance

📅 2025-07-22

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

Existing LLM agents struggle to adapt in real time to frequent, dynamic updates of rules and domain knowledge—common in compliance auditing and risk screening—due to reliance on static fine-tuning or prompt engineering. Method: We propose ARIA, an adaptive LLM agent featuring a novel structured self-dialogue mechanism to systematically identify knowledge gaps, coupled with uncertainty-aware triggering of human-in-the-loop feedback for expert guidance and timestamped, conflict-aware updates to the knowledge base. Contribution/Results: Evaluated on TikTok Pay’s production environment, ARIA significantly outperforms conventional fine-tuning and state-of-the-art self-improvement methods. It has been deployed at scale, serving over 150 million monthly active users with sustained stability. This demonstrates ARIA’s practicality, robustness, and scalability in highly dynamic operational settings.

Technology Category

Application Category

📝 Abstract

Large language model (LLM) agents often struggle in environments where rules and required domain knowledge frequently change, such as regulatory compliance and user risk screening. Current approaches, like offline fine-tuning and standard prompting, are insufficient because they cannot effectively adapt to new knowledge during actual operation. To address this limitation, we propose the Adaptive Reflective Interactive Agent (ARIA), an LLM agent framework designed specifically to continuously learn updated domain knowledge at test time. ARIA assesses its own uncertainty through structured self-dialogue, proactively identifying knowledge gaps and requesting targeted explanations or corrections from human experts. It then systematically updates an internal, timestamped knowledge repository with provided human guidance, detecting and resolving conflicting or outdated knowledge through comparisons and clarification queries. We evaluate ARIA on the realistic customer due diligence name screening task on TikTok Pay, alongside publicly available dynamic knowledge tasks. Results demonstrate significant improvements in adaptability and accuracy compared to baselines using standard offline fine-tuning and existing self-improving agents. ARIA is deployed within TikTok Pay serving over 150 million monthly active users, confirming its practicality and effectiveness for operational use in rapidly evolving environments.

Problem

Research questions and friction points this paper is trying to address.

Enabling LLM agents to adapt to dynamic environments with human guidance

Addressing knowledge gaps in real-time through self-assessment and expert input

Improving accuracy in rapidly changing domains like regulatory compliance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Continuous learning at test time

Human-in-the-loop guidance integration

Timestamped knowledge repository updates

🔎 Similar Papers

Enabling Multi-Robot Collaboration from Single-Human Guidance