Data-Model Co-Evolution: Growing Test Sets to Refine LLM Behavior

📅 2025-10-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Precisely encoding nuanced, domain-specific policies into prompt instructions for large language models (LLMs) remains a fundamental challenge. Method: This paper proposes a data–model co-evolution paradigm featuring an iterative, human-feedback-driven closed loop that simultaneously expands the test set dynamically and refines prompt instructions—integrated with structured human–AI collaboration, rationale-based behavioral attribution analysis, and iterative instruction evaluation. Contribution/Results: The approach mechanizes the concretization of ambiguous policies, systematically uncovers edge cases, and strengthens policy verifiability. A user study demonstrates that the framework significantly improves the systematicity and consistency of instruction refinement, while enhancing LLM adherence to localized policies and complex semantic rules.

Technology Category

Application Category

📝 Abstract
A long-standing challenge in machine learning has been the rigid separation between data work and model refinement, enforced by slow fine-tuning cycles. The rise of Large Language Models (LLMs) overcomes this historical barrier, allowing applications developers to instantly govern model behavior by editing prompt instructions. This shift enables a new paradigm: data-model co-evolution, where a living test set and a model's instructions evolve in tandem. We operationalize this paradigm in an interactive system designed to address the critical challenge of encoding subtle, domain-specific policies into prompt instructions. The system's structured workflow guides people to discover edge cases, articulate rationales for desired behavior, and iteratively evaluate instruction revisions against a growing test set. A user study shows our workflow helps participants refine instructions systematically and specify ambiguous policies more concretely. This work points toward more robust and responsible LLM applications through human-in-the-loop development aligned with local preferences and policies.
Problem

Research questions and friction points this paper is trying to address.

Overcoming rigid separation between data work and model refinement
Encoding subtle domain-specific policies into prompt instructions
Systematically refining LLM behavior through human-in-the-loop development
Innovation

Methods, ideas, or system contributions that make the work stand out.

Data-model co-evolution refines LLM behavior dynamically
Interactive system encodes domain policies into prompts
Growing test set evaluates instruction revisions iteratively
🔎 Similar Papers
No similar papers found.