The GRADIEND Python Package: An End-to-End System for Gradient-Based Feature Learning

📅 2026-02-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a gradient-based, end-to-end approach for learning and manipulating interpretable semantic feature directions in language models to enable controllable and persistent model editing. By constructing factual–counterfactual sample pairs, the method extracts gradients from both masked language models (MLMs) and causal language models (CLMs) to identify feature directions, supporting multi-feature comparison, weight updating, visualization, and model rewriting. The authors release a complete open-source Python framework that unifies, for the first time, the processes of feature direction learning and multi-feature analysis. Experiments on English pronoun paradigms and large-scale feature comparison tasks demonstrate the method’s effectiveness, reproducibility, and scalability in enabling precise and interpretable model edits.

Technology Category

Application Category

📝 Abstract
We present gradiend, an open-source Python package that operationalizes the GRADIEND method for learning feature directions from factual-counterfactual MLM and CLM gradients in language models. The package provides a unified workflow for feature-related data creation, training, evaluation, visualization, persistent model rewriting via controlled weight updates, and multi-feature comparison. We demonstrate GRADIEND on an English pronoun paradigm and on a large-scale feature comparison that reproduces prior use cases.
Problem

Research questions and friction points this paper is trying to address.

feature learning
gradient-based methods
language models
factual-counterfactual gradients
model editing
Innovation

Methods, ideas, or system contributions that make the work stand out.

GRADIEND
gradient-based feature learning
factual-counterfactual gradients
model editing
interpretable NLP
🔎 Similar Papers
No similar papers found.