Steered Generation via Gradient Descent on Sparse Features

📅 2025-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of precisely and continuously controlling the cognitive complexity of feedback generated by large language models (LLMs) in educational settings. We propose a fine-tuning-free, non-intrusive controllable generation method: a sparse autoencoder is trained on intermediate LLM layers to map query embeddings into interpretable, sparse latent representations; gradient-based optimization is then performed in this layer-specific latent space to steer feature activation, enabling fine-grained control over output style and cognitive difficulty. Our approach uniquely integrates sparse feature guidance with layer-wise latent-space gradient optimization, supporting both attention distribution modulation and query embedding reparameterization. Experiments demonstrate that the method systematically and continuously adjusts cognitive complexity across predefined levels while preserving semantic fidelity and textual fluency. Both human and automated evaluations confirm significantly improved control accuracy compared to baselines.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) encode a diverse range of linguistic features within their latent representations, which can be harnessed to steer their output toward specific target characteristics. In this paper, we modify the internal structure of LLMs by training sparse autoencoders to learn a sparse representation of the query embedding, allowing precise control over the model's attention distribution. We demonstrate that manipulating this sparse representation effectively transforms the output toward different stylistic and cognitive targets. Specifically, in an educational setting, we show that the cognitive complexity of LLM-generated feedback can be systematically adjusted by modifying the encoded query representation at a specific layer. To achieve this, we guide the learned sparse embedding toward the representation of samples from the desired cognitive complexity level, using gradient-based optimization in the latent space.
Problem

Research questions and friction points this paper is trying to address.

Control LLM output via sparse autoencoders
Adjust cognitive complexity in feedback
Manipulate attention distribution using gradient descent
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse autoencoders modify LLMs
Gradient descent guides embeddings
Precise control over attention distribution
🔎 Similar Papers
No similar papers found.
S
Sumanta Bhattacharyya
Department of Computer Science, University of Illinois Chicago
Pedram Rooshenas
Pedram Rooshenas
University of Illinois Chicago
Deep Generative ModelsMachine Learning