Data-Model Co-Evolution: Growing Test Sets to Refine LLM Behavior

📅 2025-10-14

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

Precisely encoding nuanced, domain-specific policies into prompt instructions for large language models (LLMs) remains a fundamental challenge. Method: This paper proposes a data–model co-evolution paradigm featuring an iterative, human-feedback-driven closed loop that simultaneously expands the test set dynamically and refines prompt instructions—integrated with structured human–AI collaboration, rationale-based behavioral attribution analysis, and iterative instruction evaluation. Contribution/Results: The approach mechanizes the concretization of ambiguous policies, systematically uncovers edge cases, and strengthens policy verifiability. A user study demonstrates that the framework significantly improves the systematicity and consistency of instruction refinement, while enhancing LLM adherence to localized policies and complex semantic rules.

Technology Category

Application Category

📝 Abstract

A long-standing challenge in machine learning has been the rigid separation between data work and model refinement, enforced by slow fine-tuning cycles. The rise of Large Language Models (LLMs) overcomes this historical barrier, allowing applications developers to instantly govern model behavior by editing prompt instructions. This shift enables a new paradigm: data-model co-evolution, where a living test set and a model's instructions evolve in tandem. We operationalize this paradigm in an interactive system designed to address the critical challenge of encoding subtle, domain-specific policies into prompt instructions. The system's structured workflow guides people to discover edge cases, articulate rationales for desired behavior, and iteratively evaluate instruction revisions against a growing test set. A user study shows our workflow helps participants refine instructions systematically and specify ambiguous policies more concretely. This work points toward more robust and responsible LLM applications through human-in-the-loop development aligned with local preferences and policies.

Problem

Research questions and friction points this paper is trying to address.

Overcoming rigid separation between data work and model refinement

Encoding subtle domain-specific policies into prompt instructions

Systematically refining LLM behavior through human-in-the-loop development

Innovation

Methods, ideas, or system contributions that make the work stand out.

Data-model co-evolution refines LLM behavior dynamically

Interactive system encodes domain policies into prompts

Growing test set evaluates instruction revisions iteratively

🔎 Similar Papers

Is Your LLM Outdated? A Deep Look at Temporal Generalization