Jingyu (Jack) Zhang
Scholar

Jingyu (Jack) Zhang

Google Scholar ID: 9EC0sDMAAAAJ
Johns Hopkins University
Natural Language Processing
Citations & Impact
All-time
Citations
386
 
H-index
11
 
i10-index
11
 
Publications
20
 
Co-authors
46
list available
Resume (English only)
Academic Achievements
  • “The Alignment Waltz: Jointly Training Agents to Collaborate for Safety” (arXiv preprint): Introduced WaltzRL, a multi-agent RL framework that improves LLM safety and reduces overrefusals through collaborative agent training
  • “Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements” (ICLR 2025): Proposed a framework for adapting LLMs to diverse safety requirements at inference without retraining
  • “Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data” (NAACL 2025 oral): Developed models that quote verbatim from trusted pre-training sources to enable easy verification
  • “SemStamp: A Semantic Watermark with Paraphrastic Robustness for Text Generation” (NAACL 2024): Proposed SemStamp, a sentence-level semantic watermarking method robust to paraphrasing via locality-sensitive hashing (LSH)
  • Recipient of the Amazon AI PhD Fellowship