Physicochemically Informed Dual-Conditioned Generative Model of T-Cell Receptor Variable Regions for Cellular Therapy

📅 2025-10-07

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Current TCR variable-region generative models struggle to simultaneously achieve novelty, diversity, and biophysical plausibility, hindering computation-driven cellular therapy development. We propose the first end-to-end, physics-informed, dual-conditional generative framework: it conditions autoregressive TCR sequence generation—using a Transformer architecture—on both peptide–HLA complex context and residue-level physicochemical features. The model is trained via maximum likelihood estimation and rigorously benchmarked against ANN, LSTM, and VAE baselines; validation includes blind in silico docking and structural modeling. On multi-neoantigen benchmarks, our method significantly improves edit distance (+32.7%), sequence similarity (+18.4%), and longest common subsequence (LCS) score (+26.1%), demonstrating broader sequence-space coverage. Computationally predicted high-affinity clones were experimentally validated, reducing affinity discovery time from months to minutes.

Technology Category

Application Category

📝 Abstract

Physicochemically informed biological sequence generation has the potential to accelerate computer-aided cellular therapy, yet current models fail to emph{jointly} ensure novelty, diversity, and biophysical plausibility when designing variable regions of T-cell receptors (TCRs). We present extbf{PhysicoGPTCR}, a large generative protein Transformer that is emph{dual-conditioned} on peptide and HLA context and trained to autoregressively synthesise TCR sequences while embedding residue-level physicochemical descriptors. The model is optimised on curated TCR--peptide--HLA triples with a maximum-likelihood objective and compared against ANN, GPTCR, LSTM, and VAE baselines. Across multiple neoantigen benchmarks, PhysicoGPTCR substantially improves edit-distance, similarity, and longest-common-subsequence scores, while populating a broader region of sequence space. Blind in-silico docking and structural modelling further reveal a higher proportion of binding-competent clones than the strongest baseline, validating the benefit of explicit context conditioning and physicochemical awareness. Experimental results demonstrate that dual-conditioned, physics-grounded generative modelling enables end-to-end design of functional TCR candidates, reducing the discovery timeline from months to minutes without sacrificing wet-lab verifiability.

Problem

Research questions and friction points this paper is trying to address.

Generating novel and diverse T-cell receptor variable regions

Ensuring biophysical plausibility in TCR sequence design

Accelerating functional TCR discovery through computational methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-conditioned generative model on peptide and HLA

Embedding residue-level physicochemical descriptors in sequences

Autoregressive synthesis of TCRs with structural validation

🔎 Similar Papers

tcrLM: a lightweight protein language model for predicting T cell receptor and epitope binding specificity