TopoCurate:Modeling Interaction Topology for Tool-Use Agent Training

📅 2026-03-02

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

This work proposes TopoCurate, a novel framework that addresses the limitations of existing agent-training approaches which rely primarily on final success rates to filter trajectories, thereby overlooking the dynamic structure of interaction processes and conflating effective strategies with accidental successes. TopoCurate introduces semantic quotient topology modeling to map multi-turn trajectories onto a structured manifold, explicitly capturing strategy bifurcations induced by tool invocations through semantic-equivalence state merging and error-recovery path identification. Building on this representation, the framework implements a dual curation mechanism tailored for both supervised fine-tuning (SFT) and reinforcement learning (RL), enabling structure-aware evaluation of trajectory quality and informativeness. Evaluated on BFCLv3 and Tau2 Bench, TopoCurate achieves performance gains of 4.2% and 6.9% over current state-of-the-art baselines in SFT and RL settings, respectively, significantly enhancing agent generalization and robustness.

Technology Category

Application Category

📝 Abstract

Training tool-use agents typically relies on outcome-based filtering: Supervised Fine-Tuning (SFT) on successful trajectories and Reinforcement Learning (RL) on pass-rate-selected tasks. However, this paradigm ignores interaction dynamics: successful trajectories may lack error recovery or exhibit redundancy, while pass rates fail to distinguish structurally informative tasks from trivial ones. We propose \textbf{TopoCurate}, an interaction-aware framework that projects multi-trial rollouts from the same task into a unified semantic quotient topology. By merging equivalent action-observation states, this projection transforms scattered linear trajectories into a structured manifold that explicitly captures how tool invocations and environmental responses drive the divergence between effective strategies and failure modes. Leveraging this representation, we introduce a dual-selection mechanism: for SFT, we prioritize trajectories demonstrating reflective recovery, semantic efficiency, and strategic diversity to mitigate covariate shift and mode collapse; for RL, we select tasks with high error branch ratios and strategic heterogeneity, maximizing gradient Signal-to-Noise Ratio to address vanishing signals in sparse-reward settings. Evaluations on BFCLv3 and Tau2 Bench show that TopoCurate achieves consistent gains of 4.2\% (SFT) and 6.9\% (RL) over state-of-the-art baselines. We will release the code and data soon for further investigations.

Problem

Research questions and friction points this paper is trying to address.

tool-use agents

interaction dynamics

trajectory filtering

task selection

sparse-reward

Innovation

Methods, ideas, or system contributions that make the work stand out.

interaction topology

trajectory curation

semantic quotient space