Monte Carlo Tree Diffusion with Multiple Experts for Protein Design

๐Ÿ“… 2025-09-19
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Protein design faces dual challenges: modeling long-range dependencies and navigating an exponentially large combinatorial search space. This paper introduces MCTD-ME, the first framework to deeply integrate masked diffusion models with Monte Carlo Tree Search (MCTS) for multi-residue collaborative optimization. We propose a pLDDT-guided dynamic masking mechanism that selectively targets low-confidence structural regions, coupled with PH-UCT-MEโ€”a multi-expert ensemble selection strategyโ€”and an entropy-augmented UCT expansion rule to enhance biophysical plausibility and planning efficiency. Evaluated on the CAMEO and PDB benchmarks, MCTD-ME significantly outperforms single-expert and unguided baselines, achieving higher amino acid recovery rates (AAR) and improved structural similarity (scTM), particularly for long-chain proteins.

Technology Category

Application Category

๐Ÿ“ Abstract
The goal of protein design is to generate amino acid sequences that fold into functional structures with desired properties. Prior methods combining autoregressive language models with Monte Carlo Tree Search (MCTS) struggle with long-range dependencies and suffer from an impractically large search space. We propose MCTD-ME, Monte Carlo Tree Diffusion with Multiple Experts, which integrates masked diffusion models with tree search to enable multi-token planning and efficient exploration. Unlike autoregressive planners, MCTD-ME uses biophysical-fidelity-enhanced diffusion denoising as the rollout engine, jointly revising multiple positions and scaling to large sequence spaces. It further leverages experts of varying capacities to enrich exploration, guided by a pLDDT-based masking schedule that targets low-confidence regions while preserving reliable residues. We propose a novel multi-expert selection rule (PH-UCT-ME) extends predictive-entropy UCT to expert ensembles. On the inverse folding task (CAMEO and PDB benchmarks), MCTD-ME outperforms single-expert and unguided baselines in both sequence recovery (AAR) and structural similarity (scTM), with gains increasing for longer proteins and benefiting from multi-expert guidance. More generally, the framework is model-agnostic and applicable beyond inverse folding, including de novo protein engineering and multi-objective molecular generation.
Problem

Research questions and friction points this paper is trying to address.

Overcoming long-range dependencies in protein sequence design
Reducing impractically large search space in protein engineering
Enhancing multi-token planning for efficient protein structure exploration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Masked diffusion models integrated with tree search
Biophysical-fidelity-enhanced diffusion denoising rollout engine
pLDDT-based masking schedule with multi-expert guidance
๐Ÿ”Ž Similar Papers
No similar papers found.
X
Xuefeng Liu
Department of Computer Science, University of Chicago
M
Mingxuan Cao
Data Science Institute, University of Chicago
S
Songhao Jiang
Department of Computer Science, University of Chicago
X
Xiao Luo
Toyota Technological Institute at Chicago
X
Xiaotian Duan
Department of Computer Science, University of Chicago
M
Mengdi Wang
AI Lab, Princeton University
T
Tobin R. Sosnick
Department of Biochemistry and Molecular Biology, University of Chicago
Jinbo Xu
Jinbo Xu
Professor, Toyota Technological Institute at Chicago
Machine LearningAlgorithm and OptimizationComputational Biology
Rick Stevens
Rick Stevens
Professor of Computer Science, University of Chicago
HPCBioinformaticsDistributed ComputingVisualizationCollaboration