🤖 AI Summary
This work addresses the challenge of catastrophic forgetting in existing protein design methods when aligning with multiple objectives, which often compromises either functional performance or fundamental designability. To overcome this, the authors propose ProteinOPD, a novel framework that introduces On-Policy Distillation to multi-teacher, multi-objective settings for the first time. By constructing a normalized geometric consensus from weighted teacher models and performing token-level knowledge distillation on trajectories generated by the student model itself, ProteinOPD effectively mitigates optimization conflicts. The approach simultaneously preserves the inherent designability of pretrained protein language models and substantially enhances alignment with target preferences, achieving an 8-fold speedup over reinforcement learning–based alignment methods.
📝 Abstract
Designing proteins with desired functions or properties represents a core goal in synthetic biology and drug discovery. Recent advances in protein language models (PLMs) have enabled the generation of highly designable protein sequences, while preference alignment provides a promising way to steer designs toward desired functions and properties. Nevertheless, they often trigger catastrophic forgetting of pretrained knowledge, degrading basic designability and failing to balance multiple competing objectives. To address these issues, we draw inspiration from On-Policy Distillation (OPD), an advanced post-training method renowned for mitigating catastrophic forgetting through its mode-seeking nature. In this work, we propose ProteinOPD, a multi-objective preference alignment framework that can effectively balance multiple preference objectives while maintaining the inherent designability of PLMs. ProteinOPD adapts a pretrained PLM into preference-specific teachers and distills their knowledge into a shared student via token-level OPD on the student's own trajectories. During this process, the student is aligned to a unique normalized geometric consensus of weighted teachers while ensuring bounded optimization under conflicts. This bridges the gap for OPD in multi-objective/teacher alignment. Extensive experiments show that ProteinOPD achieves substantial gains on target preference objectives without compromising the designability, with an 8x training speedup over RL-based alignment competitors.