ADAPT: Hybrid Prompt Optimization for LLM Feature Visualization

📅 2026-02-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of effectively visualizing internal activation directions of large language models (LLMs) in discrete text space, where existing prompt optimization methods often converge to suboptimal solutions. To overcome this limitation, the authors propose ADAPT, a novel approach that uniquely integrates beam search initialization with adaptive gradient-guided mutation, specifically designed for LLM feature visualization. By optimizing input tokens to maximize activation along target directions, ADAPT substantially enhances both activation strength and semantic interpretability of generated samples. Experiments on the Gemma-2 2B model demonstrate that ADAPT consistently outperforms current methods across diverse network layers and types of sparse autoencoder latent variables, establishing the feasibility and efficacy of feature visualization for LLMs in discrete input spaces.

Technology Category

Application Category

📝 Abstract
Understanding what features are encoded by learned directions in LLM activation space requires identifying inputs that strongly activate them. Feature visualization, which optimizes inputs to maximally activate a target direction, offers an alternative to costly dataset search approaches, but remains underexplored for LLMs due to the discrete nature of text. Furthermore, existing prompt optimization techniques are poorly suited to this domain, which is highly prone to local minima. To overcome these limitations, we introduce ADAPT, a hybrid method combining beam search initialization with adaptive gradient-guided mutation, designed around these failure modes. We evaluate on Sparse Autoencoder latents from Gemma 2 2B, proposing metrics grounded in dataset activation statistics to enable rigorous comparison, and show that ADAPT consistently outperforms prior methods across layers and latent types. Our results establish that feature visualization for LLMs is tractable, but requires design assumptions tailored to the domain.
Problem

Research questions and friction points this paper is trying to address.

feature visualization
large language models
prompt optimization
discrete text optimization
activation space
Innovation

Methods, ideas, or system contributions that make the work stand out.

feature visualization
prompt optimization
large language models
hybrid optimization
sparse autoencoders
🔎 Similar Papers
No similar papers found.
J
João N. Cardoso
Instituto Superior Técnico, INESC-ID
A
Arlindo L. Oliveira
Instituto Superior Técnico, INESC-ID
Bruno Martins
Bruno Martins
Instituto Superior Técnico and INESC-ID, University of Lisbon
Data ScienceLanguage TechnologiesInformation RetrievalGeospatial A.I.