ICL CIPHERS: Quantifying"Learning'' in In-Context Learning via Substitution Ciphers

📅 2025-04-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the conflation of “task retrieval” (memory of pretraining patterns) and “task learning” (real-time inference from demonstrations) in in-context learning (ICL). To disentangle these components, we introduce CIPHERS—a novel diagnostic task based on bijective substitution ciphers—enabling controlled ablation via reversible versus irreversible cipher variants. Experiments across four datasets and six state-of-the-art LLMs demonstrate that models significantly outperform irreversible baselines on reversible ciphers. Attention and logit probing further reveal implicit inverse-mapping structures in model representations. Crucially, CIPHERS provides the first ICL diagnostic framework that quantitatively separates memory from learning. Moreover, it establishes a new empirical metric for assessing LLMs’ compositional reasoning capabilities—specifically, their capacity to recover latent structure through invertible transformations. The results challenge prevailing assumptions about ICL mechanisms and offer a principled methodology for probing the interplay between memorization and generalization in foundation models.

Technology Category

Application Category

📝 Abstract
Recent works have suggested that In-Context Learning (ICL) operates in dual modes, i.e. task retrieval (remember learned patterns from pre-training) and task learning (inference-time ``learning'' from demonstrations). However, disentangling these the two modes remains a challenging goal. We introduce ICL CIPHERS, a class of task reformulations based on substitution ciphers borrowed from classic cryptography. In this approach, a subset of tokens in the in-context inputs are substituted with other (irrelevant) tokens, rendering English sentences less comprehensible to human eye. However, by design, there is a latent, fixed pattern to this substitution, making it reversible. This bijective (reversible) cipher ensures that the task remains a well-defined task in some abstract sense, despite the transformations. It is a curious question if LLMs can solve ICL CIPHERS with a BIJECTIVE mapping, which requires deciphering the latent cipher. We show that LLMs are better at solving ICL CIPHERS with BIJECTIVE mappings than the NON-BIJECTIVE (irreversible) baseline, providing a novel approach to quantify ``learning'' in ICL. While this gap is small, it is consistent across the board on four datasets and six models. Finally, we examine LLMs' internal representations and identify evidence in their ability to decode the ciphered inputs.
Problem

Research questions and friction points this paper is trying to address.

Disentangling task retrieval and task learning in ICL
Quantifying learning via reversible substitution ciphers
Assessing LLMs' ability to decode ciphered inputs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses substitution ciphers for task reformulation
Reversible bijective mapping to quantify learning
Analyzes LLMs' internal cipher-decoding representations
🔎 Similar Papers
No similar papers found.