π€ AI Summary
Current RNA language models (e.g., RiNALMo) lack transparency in how they encode mRNA versus non-coding RNA (ncRNA) family information, and no systematic interpretability framework exists for dissecting their learned representations.
Method: We propose SAE-RNAβthe first interpretability method applying sparse autoencoders (SAEs) to discover biologically meaningful concepts in pre-trained RNA models without retraining; it maps frozen embeddings to interpretable biological features via alignment with authoritative sequence annotations.
Results: Applying SAE-RNA to RiNALMo, we systematically decode and visualize latent representations, identifying neuron-level concepts strongly associated with canonical ncRNA families (e.g., snoRNAs, miRNAs). This enables fine-grained, cross-RNA-type functional comparison. Our work reveals how ncRNA families are encoded in large RNA models and establishes a new paradigm for RNA model interpretability, providing reliable, hypothesis-generating computational insights into RNA biology.
π Abstract
Deep learning, particularly with the advancement of Large Language Models, has transformed biomolecular modeling, with protein advances (e.g., ESM) inspiring emerging RNA language models such as RiNALMo. Yet how and what these RNA Language Models internally encode about messenger RNA (mRNA) or non-coding RNA (ncRNA) families remains unclear. We present SAE- RNA, interpretability model that analyzes RiNALMo representations and maps them to known human-level biological features. Our work frames RNA interpretability as concept discovery in pretrained embeddings, without end-to-end retraining, and provides practical tools to probe what RNA LMs may encode about ncRNA families. The model can be extended to close comparisons between RNA groups, and supporting hypothesis generation about previously unrecognized relationships.