Weights to Code: Extracting Interpretable Algorithms from the Discrete Transformer

📅 2026-01-09

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

This work proposes the Discrete Transformer, a novel architecture that overcomes the inherent difficulty of existing Transformers in extracting interpretable symbolic programs due to entangled feature representations. By functionally decoupling components—constraining the attention mechanism to serve solely as information routing and restricting the MLP to element-wise operations—and incorporating temperature-annealed sampling, the model enables a reliable mapping from continuous representations to discrete symbolic logic. This approach uniquely facilitates zero-shot algorithmic discovery over continuous variable domains within a Transformer framework, leveraging structured inductive biases and an annealing-driven phase transition mechanism to achieve fine-grained programmatic control. Experiments demonstrate that the framework matches the performance of RNNs while successfully recovering human-readable, executable programs, thereby validating its efficacy in both interpretability and algorithmic discovery.

Technology Category

Application Category

📝 Abstract

Algorithm extraction aims to synthesize executable programs directly from models trained on specific algorithmic tasks, enabling de novo algorithm discovery without relying on human-written code. However, extending this paradigm to Transformer is hindered by superposition, where entangled features encoded in overlapping directions obstruct the extraction of symbolic expressions. In this work, we propose the Discrete Transformer, an architecture explicitly engineered to bridge the gap between continuous representations and discrete symbolic logic. By enforcing a strict functional disentanglement, which constrains Numerical Attention to information routing and Numerical MLP to element-wise arithmetic, and employing temperature-annealed sampling, our method effectively facilitates the extraction of human-readable programs. Empirically, the Discrete Transformer not only achieves performance comparable to RNN-based baselines but crucially extends interpretability to continuous variable domains. Moreover, our analysis of the annealing process shows that the efficient discrete search undergoes a clear phase transition from exploration to exploitation. We further demonstrate that our method enables fine-grained control over synthesized programs by imposing inductive biases. Collectively, these findings establish the Discrete Transformer as a robust framework for demonstration-free algorithm discovery, offering a rigorous pathway toward Transformer interpretability.

Problem

Research questions and friction points this paper is trying to address.

algorithm extraction

Transformer interpretability

superposition

symbolic reasoning

discrete program synthesis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Discrete Transformer

algorithm extraction

functional disentanglement