Do Protein Transformers Have Biological Intelligence?

📅 2025-06-07

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

It remains unclear whether protein Transformers exhibit biologically interpretable intelligence. Method: We introduce Protein-FN, the first benchmark dataset tailored for functional prediction; propose Sequence Protein Transformer (SPT), a lightweight architecture (e.g., SPT-Tiny with only 5.4M parameters); and design Sequence Score, a novel interpretability method that systematically decodes biologically relevant sequence patterns captured by the model. Contribution/Results: SPT achieves state-of-the-art accuracy—94.3% on AR and 99.6% on Protein-FN—outperforming comparable models. Sequence Score identifies critical residues strongly aligned with known functional sites and evolutionarily conserved motifs, empirically validating the biological plausibility of model decisions. This work establishes a new paradigm for interpretable modeling of protein language models and data-driven discovery of molecular mechanisms.

Technology Category

Application Category

📝 Abstract

Deep neural networks, particularly Transformers, have been widely adopted for predicting the functional properties of proteins. In this work, we focus on exploring whether Protein Transformers can capture biological intelligence among protein sequences. To achieve our goal, we first introduce a protein function dataset, namely Protein-FN, providing over 9000 protein data with meaningful labels. Second, we devise a new Transformer architecture, namely Sequence Protein Transformers (SPT), for computationally efficient protein function predictions. Third, we develop a novel Explainable Artificial Intelligence (XAI) technique called Sequence Score, which can efficiently interpret the decision-making processes of protein models, thereby overcoming the difficulty of deciphering biological intelligence bided in Protein Transformers. Remarkably, even our smallest SPT-Tiny model, which contains only 5.4M parameters, demonstrates impressive predictive accuracy, achieving 94.3% on the Antibiotic Resistance (AR) dataset and 99.6% on the Protein-FN dataset, all accomplished by training from scratch. Besides, our Sequence Score technique helps reveal that our SPT models can discover several meaningful patterns underlying the sequence structures of protein data, with these patterns aligning closely with the domain knowledge in the biology community. We have officially released our Protein-FN dataset on Hugging Face Datasets https://huggingface.co/datasets/Protein-FN/Protein-FN. Our code is available at https://github.com/fudong03/BioIntelligence.

Problem

Research questions and friction points this paper is trying to address.

Exploring if Protein Transformers capture biological intelligence in sequences

Developing efficient Transformer architecture for protein function prediction

Creating explainable AI to interpret protein model decisions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Protein-FN dataset with 9000 labeled proteins

Develops Sequence Protein Transformers for efficient predictions

Creates Sequence Score XAI to interpret model decisions

🔎 Similar Papers

No similar papers found.