Investigating More Explainable and Partition-Free Compositionality Estimation for LLMs: A Rule-Generation Perspective

📅 2026-04-29

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Current approaches to evaluating compositional generalization in large language models often suffer from composition leakage due to reliance on dataset splits and lack interpretability. This work proposes a novel paradigm that circumvents data partitioning by requiring models to generate programmatic rules that map inputs to outputs, enabling rigorous analysis of their compositional capabilities through the lens of computational complexity theory. We instantiate this framework on string-to-grid tasks, leveraging program synthesis and theoretical analysis to uncover how models internalize compositional structures. Experiments across multiple state-of-the-art large language models reveal substantial and systematic deficiencies in their compositional representations, demonstrating both the efficacy and analytical power of the proposed approach.

📝 Abstract

Compositional generalization tests are often used to estimate the compositionality of LLMs. However, such tests have the following limitations: (1) they only focus on the output results without considering LLMs' understanding of sample compositionality, resulting in explainability defects; (2) they rely on dataset partition to form the test set with combinations unseen in the training set, suffering from combination leakage issues. In this work, we propose a novel rule-generation perspective for compositionality estimation for LLMs. It requires LLMs to generate a program as rules for dataset mapping and provides estimates of the compositionality of LLMs using complexity-based theory. The perspective addresses the limitations of compositional generalization tests and provides a new way to analyze the compositionality characterization of LLMs. We conduct experiments and analysis of existing advanced LLMs based on this perspective on a string-to-grid task, and find various compositionality characterizations and compositionality deficiencies exhibited by LLMs.

Problem

Research questions and friction points this paper is trying to address.

compositional generalization

explainability

combination leakage

compositionality estimation

large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

compositional generalization

rule generation

explainability