TIGER: Text-Informed Generalized Enzyme-Reaction Retrieval

๐Ÿ“… 2026-05-23
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the limitations of existing enzymeโ€“reaction bidirectional retrieval methods, which suffer from poor generalization, sensitivity to data splits, and directional asymmetry. To overcome these challenges, the authors propose a novel approach that integrates protein sequence information with textual semantic knowledge. Specifically, they employ a protein-to-text generation model to extract semantic representations of enzymes and introduce a dynamic gating mechanism to adaptively fuse sequence-derived and text-derived features. A unified representation space with shared architecture is then constructed to enable efficient bidirectional retrieval. This method, the first to leverage textual semantics for enhanced enzyme representation, consistently outperforms current models across diverse data distributions, demonstrating superior robustness, transferability, and consistency in bidirectional retrieval performance.
๐Ÿ“ Abstract
Enzyme-reaction retrieval is a fundamental problem in computational biology, underpinning enzyme characterization, reaction mechanism elucidation, and the rational design of metabolic pathways and biocatalysts. As a bidirectional task, it entails both enzyme-to-reaction and reaction-to-enzyme mapping. However, existing approaches suffer from poor generalization across tasks and distributions, with performance highly sensitive to dataset splits and substantial asymmetry between retrieval directions. To address these challenges, we present TIGER, a Text-Informed Generalized Enzyme-Reaction Retrieval framework that leverages protein-to-text generation models to distill textual semantic knowledge from enzyme sequences, providing a generalized representation that bridges enzymes and biochemical reactions. To ensure the quality and reliability of textual semantics, we design a Dynamic Gating Network that adaptively fuses text-derived knowledge with sequence features, enabling more consistent and informative enzyme representations, while a Structure-Shared Feature Projector aligns enzyme and reaction representations within a unified latent space. Extensive experiments demonstrate that, under bidirectional retrieval supervision, TIGER significantly outperforms state-of-the-art baselines across diverse distributions and exhibits strong robustness and transferability across tasks.
Problem

Research questions and friction points this paper is trying to address.

enzyme-reaction retrieval
generalization
bidirectional retrieval
asymmetry
computational biology
Innovation

Methods, ideas, or system contributions that make the work stand out.

enzyme-reaction retrieval
text-informed representation
dynamic gating network
structure-shared feature projector
bidirectional retrieval