RXNRECer Enables Fine-grained Enzymatic Function Annotation through Active Learning and Protein Language Models

📅 2026-03-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional enzyme function annotation relies on EC numbers, which suffer from ambiguous many-to-many mappings, database inconsistencies, and frequent updates. This work proposes RXNRECer, a novel end-to-end framework that bypasses EC numbers entirely by directly predicting catalyzed reactions from protein sequences. Integrating protein language models with a Transformer-based ensemble architecture and an active learning strategy, RXNRECer jointly models sequence semantics and reaction transformation patterns, enabling interpretable inference. Evaluated on both cross-validation and temporally held-out test sets, the method achieves a 16.54% improvement in F1 score and a 15.43% gain in accuracy over state-of-the-art approaches. Furthermore, RXNRECer successfully enables proteome-wide reaction annotation, refines reaction templates, elucidates functions of uncharacterized proteins, and identifies enzyme multifunctionality.

Technology Category

Application Category

📝 Abstract
A key challenge in enzyme annotation is identifying the biochemical reactions catalyzed by proteins. Most existing methods rely on Enzyme Commission (EC) numbers as intermediaries: they first predict an EC number and then retrieve the associated reactions. This indirect strategy introduces ambiguity due to the complex many-to-many mappings among proteins, EC numbers, and reactions, and is further complicated by frequent updates to EC numbers and inconsistencies across databases. To address these challenges, we present RXNRECer, a transformer-based ensemble framework that directly predicts enzyme-catalyzed reactions without relying on EC numbers. It integrates protein language modeling and active learning to capture both high-level sequence semantics and fine-grained transformation patterns. Evaluations on curated cross-validation and temporal test sets demonstrate consistent improvements over six EC-based baselines, with gains of 16.54% in F1 score and 15.43% in accuracy. Beyond accuracy gains, the framework offers clear advantages for downstream applications, including scalable proteome-wide reaction annotation, enhanced specificity in refining generic reaction schemas, systematic annotation of previously uncurated proteins, and reliable identification of enzyme promiscuity. By incorporating large language models, it also provides interpretable rationales for predictions. These capabilities make RXNRECer a robust and versatile solution for EC-free, fine-grained enzyme function prediction, with potential applications across multiple areas of enzyme research and industrial applications.
Problem

Research questions and friction points this paper is trying to address.

enzyme annotation
EC number
reaction prediction
functional ambiguity
protein-reaction mapping
Innovation

Methods, ideas, or system contributions that make the work stand out.

enzyme function prediction
protein language models
active learning
EC-free annotation
reaction prediction
Z
Zhenkun Shi
Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, 300308, Tianjin, China.
J
Jun Zhu
College of Computer Science and Technology, Harbin Engineering University, Street, Harbin, 10587, State, China.
D
Dehang Wang
Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, 300308, Tianjin, China.
B
BoYu Chen
College of Computer Science and Technology, Harbin Engineering University, Street, Harbin, 10587, State, China.
Q
Qianqian Yuan
Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, 300308, Tianjin, China.
Z
Zhitao Mao
Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, 300308, Tianjin, China.
Fan Wei
Fan Wei
Department of Mathematics, Princeton University
AnalysisCombinatoricsProbability
W
Weining Wu
College of Computer Science and Technology, Harbin Engineering University, Street, Harbin, 10587, State, China.
X
Xiaoping Liao
Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, 300308, Tianjin, China.
H
Hongwu Ma
Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, 300308, Tianjin, China.