SpeakEasy: Enhancing Text-to-Speech Interactions for Expressive Content Creation (CHI 2025)
Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs (ICLR 2025)
Improving Generalization of Speech Separation in Real-World Scenarios: Strategies in Simulation, Optimization, and Evaluation (Interspeech 2024)
GR0: Self-Supervised Global Representation Learning for Zero-Shot Voice Conversion (ICASSP 2024)
MDX-GAN: Enhancing Perceptual Quality in Multi-Class Source Separation Via Adversarial Training (ICASSP 2024)
Efficient Spoken Language Recognition Via Multilabel Classification (Interspeech 2023)
Audio Similarity is Unreliable as a Proxy for Audio Quality (Interspeech 2022)
HEAR: Holistic Evaluation of Audio Representations (NeurIPS 2021)
Controllable Speech Representation Learning via Voice Conversion and AIC Loss (ICASSP 2022)
SQAPP: No-Reference Speech Quality Assessment Via Pairwise Preference (ICASSP 2022)
Music Enhancement via Image Translation and Vocoding (ICASSP 2022)
Controllable deep melody generation via hierarchical music representation (International Society for Music Information Retrieval Conference 2021)
HiFi-GAN-2: Studio-quality speech enhancement via generative adversarial networks conditioned on acoustic features (IEEE Workshop)
Research Experience
Interned at Adobe three times between 2015 and 2017, and presented his primary research project – VoCo – at Adobe MAX Sneaks in 2016.
Education
Ph.D. in Computer Science from Princeton University, advised by Adam Finkelstein; M.S. in Music Technology from Carnegie Mellon University.
Background
Research interests: Deep generative models for speech, including studio-quality speech enhancement, speech quality assessment, and personalized voice generation. Also interested in HCI for audio applications and music generation.