Discovering Decoupled Functional Modules in Large Language Models

📅 2026-03-18

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

The internal organization of functional modules in large language models (LLMs) remains poorly understood, and effective methods for disentangling neurons and linking them to semantic concepts are lacking. To address this gap, this work proposes ULCMOD, an unsupervised cross-layer module discovery framework that introduces a novel objective function and an iterative disentanglement (IterD) algorithm. ULCMOD enables, for the first time, a comprehensive functional partitioning of neurons across the entire LLM architecture and aligns these partitions with the thematic semantics of input samples. Experimental results demonstrate that the discovered modules exhibit clear semantic coherence, a hierarchical spatial structure, and task specialization, leading to strong performance on downstream tasks. This approach significantly enhances model interpretability and fills a critical void in the study of functional disentanglement in LLMs.

Technology Category

Application Category

📝 Abstract

Understanding the internal functional organization of Large Language Models (LLMs) is crucial for improving their trustworthiness and performance. However, how LLMs organize different functions into modules remains highly unexplored. To bridge this gap, we formulate a functional module discovery problem and propose an Unsupervised LLM Cross-layer MOdule Discovery (ULCMOD) framework that simultaneously disentangles the large set of neurons in the entire LLM into modules while discovering the topics of input samples related to these modules. Our framework introduces a novel objective function and an efficient Iterative Decoupling (IterD) algorithm. Extensive experiments show that our method discovers high-quality, disentangled modules that capture more meaningful semantic information and achieve superior performance in various downstream tasks. Moreover, our qualitative analysis reveals that the discovered modules show semantic coherence, correspond to interpretable specializations, and a clear spatial and hierarchical organization within the LLM. Our work provides a novel tool for interpreting the functional modules of LLMs, filling a critical blank in LLM's interpretability research.

Problem

Research questions and friction points this paper is trying to address.

functional modules

Large Language Models

module discovery

interpretability

neuron disentanglement

Innovation

Methods, ideas, or system contributions that make the work stand out.

functional module discovery

unsupervised disentanglement

large language models