Physicochemically Informed Dual-Conditioned Generative Model of T-Cell Receptor Variable Regions for Cellular Therapy

📅 2025-10-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current TCR variable-region generative models struggle to simultaneously achieve novelty, diversity, and biophysical plausibility, hindering computation-driven cellular therapy development. We propose the first end-to-end, physics-informed, dual-conditional generative framework: it conditions autoregressive TCR sequence generation—using a Transformer architecture—on both peptide–HLA complex context and residue-level physicochemical features. The model is trained via maximum likelihood estimation and rigorously benchmarked against ANN, LSTM, and VAE baselines; validation includes blind in silico docking and structural modeling. On multi-neoantigen benchmarks, our method significantly improves edit distance (+32.7%), sequence similarity (+18.4%), and longest common subsequence (LCS) score (+26.1%), demonstrating broader sequence-space coverage. Computationally predicted high-affinity clones were experimentally validated, reducing affinity discovery time from months to minutes.

Technology Category

Application Category

📝 Abstract
Physicochemically informed biological sequence generation has the potential to accelerate computer-aided cellular therapy, yet current models fail to emph{jointly} ensure novelty, diversity, and biophysical plausibility when designing variable regions of T-cell receptors (TCRs). We present extbf{PhysicoGPTCR}, a large generative protein Transformer that is emph{dual-conditioned} on peptide and HLA context and trained to autoregressively synthesise TCR sequences while embedding residue-level physicochemical descriptors. The model is optimised on curated TCR--peptide--HLA triples with a maximum-likelihood objective and compared against ANN, GPTCR, LSTM, and VAE baselines. Across multiple neoantigen benchmarks, PhysicoGPTCR substantially improves edit-distance, similarity, and longest-common-subsequence scores, while populating a broader region of sequence space. Blind in-silico docking and structural modelling further reveal a higher proportion of binding-competent clones than the strongest baseline, validating the benefit of explicit context conditioning and physicochemical awareness. Experimental results demonstrate that dual-conditioned, physics-grounded generative modelling enables end-to-end design of functional TCR candidates, reducing the discovery timeline from months to minutes without sacrificing wet-lab verifiability.
Problem

Research questions and friction points this paper is trying to address.

Generating novel and diverse T-cell receptor variable regions
Ensuring biophysical plausibility in TCR sequence design
Accelerating functional TCR discovery through computational methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-conditioned generative model on peptide and HLA
Embedding residue-level physicochemical descriptors in sequences
Autoregressive synthesis of TCRs with structural validation
🔎 Similar Papers
No similar papers found.
Jiahao Ma
Jiahao Ma
Australia National University
Computer visionMultiview detectionNovel view synthesis
H
Hongzong Li
The Hong Kong University of Science and Technology, Hong Kong
Y
Ye-Fan Hu
BayVax Biotech Limited, Hong Kong
Jian-Dong Huang
Jian-Dong Huang
the University of Hong Kong
molecular biology