Scholar

Rafael Valle

Google Scholar ID: SktxU8IAAAAJ

NVIDIA, UC Berkeley, CNMAT

Machine Listening and Improvisation

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

2,640

H-index

i10-index

Publications

Co-authors

list available

Contact

TwitterOpen ↗GitHubOpen ↗LinkedInOpen ↗

Publications

12 items

Speech-Hands: A Self-Reflection Voice Agentic Approach to Speech Recognition and Audio Reasoning with Omni Perception

2026

Cited

UALM: Unified Audio Language Model for Understanding, Generation and Reasoning

2025

Cited

Audio Flamingo Sound-CoT Technical Report: Improving Chain-of-Thought Reasoning in Sound Understanding

2025

Cited

Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio Language Models

2025

Cited

Multi-Domain Audio Question Answering Toward Acoustic Content Reasoning in The DCASE 2025 Challenge

2025

Cited

Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities

2025

Cited

UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation

2025

Cited

Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance

2025

Cited

Resume (English only)

Academic Achievements

Co-invented Fugatto, Audio Flamingo, OMCAT, ETTA, Koel-TTS, P-Flow, the RAD* family of models, etc. Published papers include 'Fugatto: Foundational Generative Audio Transformer Opus 1'.

Research Experience

Worked as a polymath research scientist and manager at NVIDIA, representing ADLR's (Applied Deep Learning Research) audio team. The team focuses on generative models with intelligence in audio understanding and synthesis, occasionally exploring vision. Was a Research Intern at Gracenote in Emeryville during Fall 2016, working on audio classification using Deep Learning. Previously, a Scientist Intern at Pandora in Oakland, investigating segments and scores that describe novelty seeking behavior in listeners.

Education

PhD from UC Berkeley, advised by Prof. Sanjit Seshia and Prof. Edmund Campion; focused on machine listening and improvisation. Master's in Computer Music from HMDK Stuttgart, Germany; Bachelor's in Orchestral Conducting from UFRJ, Brazil.

Background

Pursuing Superintelligence in Multimodal Generation and Understanding at Meta. Passionate about generative modeling, machine perception, and machine improvisation.

Co-authors

46 total