The GRADIEND Python Package: An End-to-End System for Gradient-Based Feature Learning

📅 2026-02-27

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

This work proposes a gradient-based, end-to-end approach for learning and manipulating interpretable semantic feature directions in language models to enable controllable and persistent model editing. By constructing factual–counterfactual sample pairs, the method extracts gradients from both masked language models (MLMs) and causal language models (CLMs) to identify feature directions, supporting multi-feature comparison, weight updating, visualization, and model rewriting. The authors release a complete open-source Python framework that unifies, for the first time, the processes of feature direction learning and multi-feature analysis. Experiments on English pronoun paradigms and large-scale feature comparison tasks demonstrate the method’s effectiveness, reproducibility, and scalability in enabling precise and interpretable model edits.

Technology Category

Application Category

📝 Abstract

We present gradiend, an open-source Python package that operationalizes the GRADIEND method for learning feature directions from factual-counterfactual MLM and CLM gradients in language models. The package provides a unified workflow for feature-related data creation, training, evaluation, visualization, persistent model rewriting via controlled weight updates, and multi-feature comparison. We demonstrate GRADIEND on an English pronoun paradigm and on a large-scale feature comparison that reproduces prior use cases.

Problem

Research questions and friction points this paper is trying to address.

feature learning

gradient-based methods

language models

factual-counterfactual gradients

model editing

Innovation

Methods, ideas, or system contributions that make the work stand out.

GRADIEND

gradient-based feature learning

factual-counterfactual gradients