Synthesizing Privacy-Preserving Text Data via Finetuning without Finetuning Billion-Scale LLMs, ICML 2025 (Spotlight Presentation @ ICLR 2025 DATA-FM Workshop)
LLM360 K2: Building a 65B 360-Open-Source Large Language Model from Scratch, Pre-Print
Crystal: Illuminating LLM Abilities on Language and Code, COLM 2024
LLM360: Towards Fully Transparent Open-Source LLMs, COLM 2024
RedCoast: A Lightweight Tool to Automate Distributed Training of LLMs on Any GPU/TPUs, NAACL 2024 Demo (Best Demo Paper Runner Up)
Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer, NeurIPS 2023
Neural-Symbolic Interaction and Co-Evolving, published in Compendium of Neurosymbolic Artificial Intelligence, IOS Press
BertNet: Harvesting Knowledge Graphs from Pretrained Language Models, ACL 2023, Findings
Text Generation with Efficient (Soft) Q-Learning, EMNLP 2022, Findings