CVPR 2024: Introduced MoSAR, a method to generate 4K-resolution, pore-level detailed 3D avatars from a single portrait image, and released the FFHQ-UV-Intrinsics dataset with high-resolution intrinsic face attributes for 10K subjects.
NeurIPS Workshop on ML for Audio 2023: Published 'EDMSound', a spectrogram-based diffusion model for high-quality audio synthesis, and identified data duplication risks in diffusion-based audio generation.
IEEE Signal Processing Letters 2023: Published 'Rhythm Modeling for Voice Conversion', modeling natural speech rhythm (sonorants, obstruents, silences) for rhythm-preserving voice conversion.
Published 'ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech' in 2023, enabling speech-driven gesture synthesis without training examples, featured on 2 Minute Papers.
INTERSPEECH 2024: Co-authored 'Spoken-Term Discovery using Discrete Speech Units' (DUSTED), achieving state-of-the-art results on the ZeroSpeech Challenge spoken-term discovery track.