π€ AI Summary
This work addresses the lack of efficient, real-time interpretability and behavioral control mechanisms for large language models deployed across multiple GPUs. We propose a scalable, activation-level interpretability and steering system designed for multi-GPU environments, which, for the first time, enables full-layer activation trajectory capture and position-tagged steering vector injection without requiring fine-tuning or additional forward passes. Leveraging distributed activation caching, post-LayerNorm vector injection, and logit lensβbased trajectory tracking, our system supports LLaMA-3.1 and Qwen-3 series models, reducing activation memory usage by up to 7Γ and increasing throughput by 41Γ under identical hardware constraints. It maintains processing speeds of 20β100 tokens per second on sequences up to 1,500 tokens, achieving an average steering efficacy (measured by steering slope) of 0.702.
π Abstract
Large language models that require multiple GPU cards to host are usually the most capable models. It is necessary to understand and steer these models, but the current technologies do not support the interpretability and steering of these models in the multi-GPU setting as well as the single-GPU setting. We present a practical implementation of activation-level interpretability (logit lens) and steering (steering vector) that scales up to multi-GPU language models. Our system implements design choices that reduce the activation memory by up to 7x and increase the throughput by up to 41x compared to a baseline on identical hardware. We demonstrate the method across LLaMA-3.1 (8B, 70B) and Qwen-3 (4B, 14B, 32B), sustaining 20-100 tokens/s while collecting full layer-wise activation trajectories for sequences of 1,500 tokens. Using label-position steering vectors injected post-LayerNorm, we show controllable, monotonic shifts in model outputs with a mean steerability slope of 0.702 across evaluated datasets, without fine-tuning or additional forward passes. We release detailed benchmarks, ablations, and a reproducible instrumentation recipe to enable practical interpretability and real-time behavioral control for frontier LLMs at https://github.com/Devdesai1901/LogitLense.