Scholar

Satvik Dixit

Google Scholar ID: fO8a44AAAAAJ

Carnegie Mellon University

Speech and AudioLarge Language Models

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

23

H-index

2

i10-index

0

Publications

7

Co-authors

4

list available

Contact

Emailsatvikdixit@cmu.edu CVOpen ↗GitHubOpen ↗LinkedInOpen ↗

Publications

5 items

FoleyBench: A Benchmark For Video-to-Audio Models

2025

Cited

0

AURA Score: A Metric For Holistic Audio Question Answering Evaluation

2025

Cited

0

MMAU-Pro: A Challenging and Comprehensive Benchmark for Holistic Evaluation of Audio General Intelligence

2025

Cited

0

Learning Perceptually Relevant Temporal Envelope Morphing

2025

Cited

0

Mellow: a small audio language model for reasoning

2025

Cited

0

Resume (English only)

Academic Achievements

- Mellow: a small audio language model for reasoning (NeurIPS 2025)
- MACE: Leveraging Audio for Evaluating Audio Captioning Systems (ICASSP 2025 SALMA Workshop)
- Vision Language Models Are Few-Shot Audio Spectrogram Classifiers (NeuRIPS 2024 Audio Imagination Workshop)

Research Experience

- Worked with Professor Chris Donahue on Generative Audio
- Worked with Professor Bhiksha Raj on Audio Language Models
- Interned with Dr. Satrajit Ghosh at MIT
- Interned with Dr. Martin Vetterli at EPFL

Education

Undergraduate degree in Electrical Engineering from IIT Delhi, with a concentration on signals processing and ML.

Background

Master's student at Carnegie Mellon University, interested in audio understanding and generation.

Miscellany

Email: satvikdixit@cmu.edu
Google Scholar: Google Scholar
LinkedIn: LinkedIn

Co-authors

4 total

Carnegie Mellon University

Microsoft, Carnegie Mellon University

Assistant Professor, CMU CSD; Research Scientist, Google DeepMind (part time)

Senior Research Scientist, MIT; Assistant Professor, HMS