🤖 AI Summary
This study investigates whether controllable steering of scientific code generation—specifically toward language biases such as C++—can be achieved via targeted intervention in the latent subspaces of large language models. To this end, we propose G-ACT, a gradient-driven adaptive activation guidance framework: it identifies critical subspaces through MLP neuron attribution and clustering analysis, then applies fine-grained latent-space intervention in Transformers using lightweight online-trained probes and hierarchical activation injection. Crucially, G-ACT enables dynamic, prompt-conditioned selection of guidance directions, enhancing both controllability and interpretability of code generation. Evaluated on LLaMA-3.2 3B, our method improves probe classification accuracy by an average of 15%, reaching 61.5% in early layers; it remains robust and effective on the 70B variant, demonstrating strong scalability and generalization across model scales.
📝 Abstract
This work examines whether activating latent subspaces in language models (LLMs) can steer scientific code generation toward a specific programming language. Five causal LLMs were first evaluated on scientific coding prompts to quantify their baseline bias among four programming languages. A static neuron-attribution method, perturbing the highest activated MLP weight for a C++ or CPP token, proved brittle and exhibited limited generalization across prompt styles and model scales. To address these limitations, a gradient-refined adaptive activation steering framework (G-ACT) was developed: per-prompt activation differences are clustered into a small set of steering directions, and lightweight per-layer probes are trained and refined online to select the appropriate steering vector. In LLaMA-3.2 3B, this approach reliably biases generation towards the CPP language by increasing the average probe classification accuracy by 15% and the early layers (0-6) improving the probe classification accuracy by 61.5% compared to the standard ACT framework. For LLaMA-3.3 70B, where attention-head signals become more diffuse, targeted injections at key layers still improve language selection. Although per-layer probing introduces a modest inference overhead, it remains practical by steering only a subset of layers and enables reproducible model behavior. These results demonstrate a scalable, interpretable and efficient mechanism for concept-level control for practical agentic systems.