π€ AI Summary
This work addresses the limited interpretability of current protein language models, which struggle to reveal cross-layer computational mechanisms and reconstruct internal reasoning pathways. To overcome this, the authors propose ProtoMech, a novel framework that jointly learns sparse latent representations across all layers via a cross-layer transcoder, enabling end-to-end tracing of the modelβs full computational circuitry. Evaluated on ESM2, ProtoMech successfully identifies compressed circuits highly aligned with protein structure and function, facilitating efficient protein design. Experiments show that ProtoMech recovers 82β89% of the original modelβs performance on protein family classification and function prediction tasks, retains up to 79% accuracy using less than 1% of the latent space, and significantly outperforms existing protein design baselines in over 70% of cases.
π Abstract
Protein language models (pLMs) have emerged as powerful predictors of protein structure and function. However, the computational circuits underlying their predictions remain poorly understood. Recent mechanistic interpretability methods decompose pLM representations into interpretable features, but they treat each layer independently and thus fail to capture cross-layer computation, limiting their ability to approximate the full model. We introduce ProtoMech, a framework for discovering computational circuits in pLMs using cross-layer transcoders that learn sparse latent representations jointly across layers to capture the model's full computational circuitry. Applied to the pLM ESM2, ProtoMech recovers 82-89% of the original performance on protein family classification and function prediction tasks. ProtoMech then identifies compressed circuits that use<1% of the latent space while retaining up to 79% of model accuracy, revealing correspondence with structural and functional motifs, including binding, signaling, and stability. Steering along these circuits enables high-fitness protein design, surpassing baseline methods in more than 70% of cases. These results establish ProtoMech as a principled framework for protein circuit tracing.