Investigating More Explainable and Partition-Free Compositionality Estimation for LLMs: A Rule-Generation Perspective

📅 2026-04-29
📈 Citations: 0
Influential: 0
📄 PDF

career value

190K/year
🤖 AI Summary
Current approaches to evaluating compositional generalization in large language models often suffer from composition leakage due to reliance on dataset splits and lack interpretability. This work proposes a novel paradigm that circumvents data partitioning by requiring models to generate programmatic rules that map inputs to outputs, enabling rigorous analysis of their compositional capabilities through the lens of computational complexity theory. We instantiate this framework on string-to-grid tasks, leveraging program synthesis and theoretical analysis to uncover how models internalize compositional structures. Experiments across multiple state-of-the-art large language models reveal substantial and systematic deficiencies in their compositional representations, demonstrating both the efficacy and analytical power of the proposed approach.
📝 Abstract
Compositional generalization tests are often used to estimate the compositionality of LLMs. However, such tests have the following limitations: (1) they only focus on the output results without considering LLMs' understanding of sample compositionality, resulting in explainability defects; (2) they rely on dataset partition to form the test set with combinations unseen in the training set, suffering from combination leakage issues. In this work, we propose a novel rule-generation perspective for compositionality estimation for LLMs. It requires LLMs to generate a program as rules for dataset mapping and provides estimates of the compositionality of LLMs using complexity-based theory. The perspective addresses the limitations of compositional generalization tests and provides a new way to analyze the compositionality characterization of LLMs. We conduct experiments and analysis of existing advanced LLMs based on this perspective on a string-to-grid task, and find various compositionality characterizations and compositionality deficiencies exhibited by LLMs.
Problem

Research questions and friction points this paper is trying to address.

compositional generalization
explainability
combination leakage
compositionality estimation
large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

compositional generalization
rule generation
explainability
compositionality estimation
large language models