CoVe: Training Interactive Tool-Use Agents via Constraint-Guided Verification

📅 2026-03-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge faced by multi-turn interactive tool-using agents in generating correct and deterministic action sequences under complex and ambiguous user requests. To this end, the authors propose the CoVe framework, which uniquely integrates explicit task constraints to simultaneously guide trajectory generation and validate trajectory quality. This approach enables efficient synthesis of high-quality training data and provides precise reward signals for both supervised fine-tuning (SFT) and reinforcement learning (RL). The CoVe-4B model trained under this framework achieves success rates of 43.0% and 59.4% on the Airline and Retail domains of the τ²-bench, respectively—significantly outperforming strong baselines of comparable scale and matching the performance of models up to 17 times larger.

Technology Category

Application Category

📝 Abstract
Developing multi-turn interactive tool-use agents is challenging because real-world user needs are often complex and ambiguous, yet agents must execute deterministic actions to satisfy them. To address this gap, we introduce \textbf{CoVe} (\textbf{Co}nstraint-\textbf{Ve}rification), a post-training data synthesis framework designed for training interactive tool-use agents while ensuring both data complexity and correctness. CoVe begins by defining explicit task constraints, which serve a dual role: they guide the generation of complex trajectories and act as deterministic verifiers for assessing trajectory quality. This enables the creation of high-quality training trajectories for supervised fine-tuning (SFT) and the derivation of accurate reward signals for reinforcement learning (RL). Our evaluation on the challenging $τ^2$-bench benchmark demonstrates the effectiveness of the framework. Notably, our compact \textbf{CoVe-4B} model achieves success rates of 43.0\% and 59.4\% in the Airline and Retail domains, respectively; its overall performance significantly outperforms strong baselines of similar scale and remains competitive with models up to $17\times$ its size. These results indicate that CoVe provides an effective and efficient pathway for synthesizing training data for state-of-the-art interactive tool-use agents. To support future research, we open-source our code, trained model, and the full set of 12K high-quality trajectories used for training.
Problem

Research questions and friction points this paper is trying to address.

interactive tool-use agents
complex user needs
deterministic actions
training data synthesis
multi-turn interaction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Constraint-Guided Verification
Interactive Tool-Use Agents
Data Synthesis
Supervised Fine-Tuning
Reinforcement Learning
🔎 Similar Papers
No similar papers found.
Jinpeng Chen
Jinpeng Chen
City University of Hong Kong
Continual LearningMultimodal Large Language Model
C
Cheng Gong
Huawei Research
H
Hanbo Li
Independent Researcher
Z
Ziru Liu
Huawei Research
Zichen Tian
Zichen Tian
CVML Lab@SMU
computer visiondeep learning
Xinyu Fu
Xinyu Fu
Hong Kong Research Center, Huawei
Large Language ModelsMLLMAgentsHeterogeneous Graphs
S
Shi Wu
Huawei Research
C
Chenyang Zhang
Huawei Research
W
Wu Zhang
Huawei Research
S
Suiyun Zhang
Huawei Research
D
Dandan Tu
Huawei Research
R
Rui Liu
Huawei Research