RiboSphere: Learning Unified and Efficient Representations of RNA Structures

πŸ“… 2026-03-20
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
RNA three-dimensional structure modeling faces significant challenges due to high backbone flexibility, the prevalence of non-canonical interactions, and the scarcity of experimentally determined structures. This work proposes a novel approach that integrates vector quantization with flow matching: a geometric Transformer encoder extracts SE(3)-invariant features, which are then discretized via finite scalar quantization (FSQ) into a codebook enriched with RNA structural motif information. A flow-matching decoder subsequently reconstructs atomic coordinates from these discrete representations. By uniquely combining discrete geometric representations with the modular nature of RNA, the method achieves state-of-the-art performance in structure reconstruction, yielding an RMSD of 1.25 Γ… and a TM-score of 0.84. Furthermore, it demonstrates strong transferability and robust generalization under data-scarce conditions in downstream tasks such as inverse folding and RNA–ligand binding prediction.

Technology Category

Application Category

πŸ“ Abstract
Accurate RNA structure modeling remains difficult because RNA backbones are highly flexible, non-canonical interactions are prevalent, and experimentally determined 3D structures are comparatively scarce. We introduce \emph{RiboSphere}, a framework that learns \emph{discrete} geometric representations of RNA by combining vector quantization with flow matching. Our design is motivated by the modular organization of RNA architecture: complex folds are composed from recurring structural motifs. RiboSphere uses a geometric transformer encoder to produce SE(3)-invariant (rotation/translation-invariant) features, which are discretized with finite scalar quantization (FSQ) into a finite vocabulary of latent codes. Conditioned on these discrete codes, a flow-matching decoder reconstructs atomic coordinates, enabling high-fidelity structure generation. We find that the learned code indices are enriched for specific RNA motifs, suggesting that the model captures motif-level compositional structure rather than acting as a purely compressive bottleneck. Across benchmarks, RiboSphere achieves strong performance in structure reconstruction (RMSD 1.25\,Γ…, TM-score 0.84), and its pretrained discrete representations transfer effectively to inverse folding and RNA--ligand binding prediction, with robust generalization in data-scarce regimes.
Problem

Research questions and friction points this paper is trying to address.

RNA structure modeling
non-canonical interactions
3D structure scarcity
backbone flexibility
Innovation

Methods, ideas, or system contributions that make the work stand out.

discrete representation
geometric deep learning
RNA structure modeling
vector quantization
flow matching
πŸ”Ž Similar Papers
No similar papers found.