Binary Latent Protein Fitness Landscapes for Quantum Annealing Optimization

📅 2026-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a framework for efficiently exploring high-dimensional protein sequence spaces to identify high-fitness variants, tailored for compatibility with quantum optimization hardware. By leveraging a pretrained protein language model, sequences are embedded and binarized into a compact binary latent space, enabling the first formulation of protein fitness landscapes as Quadratic Unconstrained Binary Optimization (QUBO) problems. This formulation establishes a direct interface between representation learning and quantum annealing-based optimization. The approach is compatible with both classical solvers—such as simulated annealing and genetic algorithms—and quantum-inspired heuristics. Evaluated on the ProteinGym benchmark, the method successfully identifies high-fitness variants whose neighboring sequences consistently reside at the forefront of the fitness distribution, thereby validating the efficacy and practicality of the proposed QUBO modeling strategy.

Technology Category

Application Category

📝 Abstract
We propose Q-BIOLAT, a framework for modeling and optimizing protein fitness landscapes in binary latent spaces. Starting from protein sequences, we leverage pretrained protein language models to obtain continuous embeddings, which are then transformed into compact binary latent representations. In this space, protein fitness is approximated using a quadratic unconstrained binary optimization (QUBO) model, enabling efficient combinatorial search via classical heuristics such as simulated annealing and genetic algorithms. On the ProteinGym benchmark, we demonstrate that Q-BIOLAT captures meaningful structure in protein fitness landscapes and enables the identification of high-fitness variants. Despite using a simple binarization scheme, our method consistently retrieves sequences whose nearest neighbors lie within the top fraction of the training fitness distribution, particularly under the strongest configurations. We further show that different optimization strategies exhibit distinct behaviors, with evolutionary search performing better in higher-dimensional latent spaces and local search remaining competitive in preserving realistic sequences. Beyond its empirical performance, Q-BIOLAT provides a natural bridge between protein representation learning and combinatorial optimization. By formulating protein fitness as a QUBO problem, our framework is directly compatible with emerging quantum annealing hardware, opening new directions for quantum-assisted protein engineering. Our implementation is publicly available at: https://github.com/HySonLab/Q-BIOLAT
Problem

Research questions and friction points this paper is trying to address.

protein fitness landscapes
binary latent space
combinatorial optimization
QUBO
protein engineering
Innovation

Methods, ideas, or system contributions that make the work stand out.

QUBO
binary latent representation
protein fitness landscape
quantum annealing
protein language models
🔎 Similar Papers
No similar papers found.