Scholar

Rithesh Kumar

Google Scholar ID: hJjeVsQAAAAJ

Adobe Research

AudioArtificial IntelligenceDeep Learning

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

2,865

H-index

i10-index

Publications

Co-authors

Contact

Emailritheshkumar.95@gmail.com CVOpen ↗TwitterOpen ↗GitHubOpen ↗LinkedInOpen ↗

Publications

8 items

Taming Audio VAEs via Target-KL Regularization

2026

Cited

AudioChat: Unified Audio Storytelling, Editing, and Understanding with Transfusion Forcing

2026

Cited

TAC: Timestamped Audio Captioning

2026

Cited

PromptSep: Generative Audio Separation via Multimodal Prompting

2025

Cited

SpeechOp: Inference-Time Task Composition for Generative Speech Processing

2025

Cited

DiTSE: High-Fidelity Generative Speech Enhancement via Latent Diffusion Transformers

2025

Cited

SpeakEasy: Enhancing Text-to-Speech Interactions for Expressive Content Creation

2025

Cited

DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis

2024

Cited

Resume (English only)

Academic Achievements

- DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis (ICML 2025)
- High-Fidelity Audio Compression with Improved RVQGAN (NeurIPS 2023)
- VampNet: Music Generation via Masked Acoustic Token Modeling (ISMIR 2023)
- Chunked Autoregressive GAN for Conditional Waveform Synthesis (ICLR 2022)
- MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis (conference not specified)

Research Experience

- Led speech generation research at Adobe Research, including zero-shot voice generation and voice translation.
- Served as Technical Lead for Audio Research at Descript Inc., where he developed and shipped multiple text-to-speech models powering the flagship Overdub and Regenerate features.

Education

Completed MSc in Computer Science (specializing in Artificial Intelligence) at the Mila lab in Université de Montréal, supervised by Yoshua Bengio. Graduated from SSN College of Engineering (affiliated to Anna University) with a Bachelors in Computer Science and Engineering. In the final 2 years of undergrad, he learned about deep learning, spent a summer at the Serre Lab in Brown University, and collaborated with Prof. Yoshua Bengio at the Mila lab.

Background

A Senior Research Scientist on the Speech AI team at Adobe Research, focusing on controllable text-to-speech synthesis, automatic dubbing, and speech editing. His work centers on scaling diffusion models and developing efficient distillation algorithms for multilingual audio generation.

Miscellany

Currently living in Toronto, Ontario, Canada.

Co-authors

0 total

Co-authors: 0 (list not available)