VERBA: Verbalizing Model Differences Using Large Language Models

📅 2025-07-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In machine learning, the “model lake” phenomenon—where numerous models exhibit comparable performance yet divergent behavioral patterns—hampers systematic model comparison. Method: This paper introduces VERBA, the first framework enabling automated, fine-grained, natural-language description of behavioral differences among models. VERBA integrates output sampling, simulation-based evaluation protocols, and structured prompting to elicit high-fidelity behavioral analyses from large language models (LLMs), circumventing the O(N²) complexity of manual pairwise comparison. Contribution/Results: By incorporating structural information modeling—including decision-path alignment and error-pattern categorization—VERBA significantly improves descriptive accuracy: it achieves 90% accuracy on decision-tree model pairs, outperforming the baseline (80%). VERBA provides a scalable, end-to-end solution for enhancing model transparency, interpretability assessment, and trustworthy model selection.

Technology Category

Application Category

📝 Abstract
In the current machine learning landscape, we face a "model lake" phenomenon: Given a task, there is a proliferation of trained models with similar performances despite different behavior. For model users attempting to navigate and select from the models, documentation comparing model pairs is helpful. However, for every $N$ models there could be $O(N^2)$ pairwise comparisons, a number prohibitive for the model developers to manually perform pairwise comparisons and prepare documentations. To facilitate fine-grained pairwise comparisons among models, we introduced $ extbf{VERBA}$. Our approach leverages a large language model (LLM) to generate verbalizations of model differences by sampling from the two models. We established a protocol that evaluates the informativeness of the verbalizations via simulation. We also assembled a suite with a diverse set of commonly used machine learning models as a benchmark. For a pair of decision tree models with up to 5% performance difference but 20-25% behavioral differences, $ extbf{VERBA}$ effectively verbalizes their variations with up to 80% overall accuracy. When we included the models' structural information, the verbalization's accuracy further improved to 90%. $ extbf{VERBA}$ opens up new research avenues for improving the transparency and comparability of machine learning models in a post-hoc manner.
Problem

Research questions and friction points this paper is trying to address.

Automating pairwise comparisons of ML models efficiently
Verbalizing model differences using LLMs for transparency
Reducing manual effort in model documentation generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages LLM to verbalize model differences
Establishes protocol for evaluating verbalization informativeness
Improves accuracy by including structural information
🔎 Similar Papers
No similar papers found.