Multi-Scale Reversible Chaos Game Representation: A Unified Framework for Sequence Classification

πŸ“… 2026-04-20
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

208K/year
πŸ€– AI Summary
Existing methods for biological sequence classification struggle to simultaneously achieve high performance and interpretability. This work proposes a multi-scale reversible Chaos Game Representation (MS-RCGR) framework that, for the first time, enables fully invertible and multi-resolution sequence encoding. By leveraging rational-number arithmetic and hierarchical k-mer decomposition, the method losslessly maps sequences into multi-resolution geometric images, integrating Chaos Game Representation features, geometric descriptors, and embeddings from pre-trained protein language models such as ESM2 and ProtT5. The framework unifies traditional machine learning, computer vision, and hybrid modeling paradigms, significantly improving classification performance across seven DNA and protein datasets. Notably, the hybrid approach yields the best results, demonstrating the framework’s efficacy, flexibility, and interpretability.

Technology Category

Application Category

πŸ“ Abstract
Biological classification with interpretability remains a challenging task. For this, we introduce a novel encoding framework, Multi-Scale Reversible Chaos Game Representation (MS-RCGR), that transforms biological sequences into multi-resolution geometric representations with guaranteed reversibility. Unlike traditional sequence encoding methods, MS-RCGR employs rational arithmetic and hierarchical k-mer decomposition to generate scale-invariant features that preserve complete sequence information while enabling diverse analytical approaches. Our framework bridges three distinct paradigms for sequence analysis: (1) traditional machine learning using extracted geometric features, (2) computer vision models operating on CGR-generated images, and (3) hybrid approaches combining protein language model embeddings with CGR features. Through comprehensive experiments on synthetic DNA and protein datasets encompassing seven distinct sequence classes, we demonstrate that MS-RCGR features consistently enhance classification performance across all paradigms. Notably, our hybrid approach combining pre-trained language model embeddings (ESM2, ProtT5) with MS-RCGR features achieves superior performance compared to either method alone. The reversibility property of our encoding ensures no information loss during transformation, while multi-scale analysis captures patterns ranging from individual nucleotides to complex motif structures. Our results indicate that MS-RCGR provides a flexible, interpretable, and high-performing foundation for biological sequence analysis.
Problem

Research questions and friction points this paper is trying to address.

biological sequence classification
interpretability
sequence encoding
multi-scale analysis
reversible representation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Scale Reversible Chaos Game Representation
sequence encoding
reversibility
scale-invariant features
hybrid sequence analysis
πŸ”Ž Similar Papers