🤖 AI Summary
Current TCR variable-region generative models struggle to simultaneously achieve novelty, diversity, and biophysical plausibility, hindering computation-driven cellular therapy development. We propose the first end-to-end, physics-informed, dual-conditional generative framework: it conditions autoregressive TCR sequence generation—using a Transformer architecture—on both peptide–HLA complex context and residue-level physicochemical features. The model is trained via maximum likelihood estimation and rigorously benchmarked against ANN, LSTM, and VAE baselines; validation includes blind in silico docking and structural modeling. On multi-neoantigen benchmarks, our method significantly improves edit distance (+32.7%), sequence similarity (+18.4%), and longest common subsequence (LCS) score (+26.1%), demonstrating broader sequence-space coverage. Computationally predicted high-affinity clones were experimentally validated, reducing affinity discovery time from months to minutes.
📝 Abstract
Physicochemically informed biological sequence generation has the potential to accelerate computer-aided cellular therapy, yet current models fail to emph{jointly} ensure novelty, diversity, and biophysical plausibility when designing variable regions of T-cell receptors (TCRs). We present extbf{PhysicoGPTCR}, a large generative protein Transformer that is emph{dual-conditioned} on peptide and HLA context and trained to autoregressively synthesise TCR sequences while embedding residue-level physicochemical descriptors. The model is optimised on curated TCR--peptide--HLA triples with a maximum-likelihood objective and compared against ANN, GPTCR, LSTM, and VAE baselines. Across multiple neoantigen benchmarks, PhysicoGPTCR substantially improves edit-distance, similarity, and longest-common-subsequence scores, while populating a broader region of sequence space. Blind in-silico docking and structural modelling further reveal a higher proportion of binding-competent clones than the strongest baseline, validating the benefit of explicit context conditioning and physicochemical awareness. Experimental results demonstrate that dual-conditioned, physics-grounded generative modelling enables end-to-end design of functional TCR candidates, reducing the discovery timeline from months to minutes without sacrificing wet-lab verifiability.