Breaking the Barriers of Text-Hungry and Audio-Deficient AI

📅 2025-06-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current AI systems heavily rely on text-based representations, limiting their applicability to approximately 700 million people in rural and remote regions who primarily use spoken—often unwritten or low-resource—languages. To address this, we propose the first end-to-end audio-to-audio machine intelligence framework that bypasses textual intermediaries entirely, directly modeling semantic and expressive content from raw speech. Our method introduces (1) Multi-scale Audio Semantic Tokenization (MAST), enabling deep cross-lingual semantic disentanglement; and (2) a mean-field fractional diffusion generative paradigm grounded in fractional Brownian motion, supporting high-fidelity, semantically consistent speech synthesis and translation without text supervision. The framework is agnostic to audio representations—including spectrograms, wavelets, scalograms, and discrete units—thereby significantly enhancing generalizability, robustness, and scalability for under-digitized languages.

Technology Category

Application Category

📝 Abstract
While global linguistic diversity spans more than 7164 recognized languages, the current dominant architecture of machine intelligence remains fundamentally biased toward written text. This bias excludes over 700 million people particularly in rural and remote regions who are audio-literate. In this work, we introduce a fully textless, audio-to-audio machine intelligence framework designed to serve this underserved population, and all the people who prefer audio-efficiency. Our contributions include novel Audio-to-Audio translation architectures that bypass text entirely, including spectrogram-, scalogram-, wavelet-, and unit-based models. Central to our approach is the Multiscale Audio-Semantic Transform (MAST), a representation that encodes tonal, prosodic, speaker, and expressive features. We further integrate MAST into a fractional diffusion of mean-field-type framework powered by fractional Brownian motion. It enables the generation of high-fidelity, semantically consistent speech without reliance on textual supervision. The result is a robust and scalable system capable of learning directly from raw audio, even in languages that are unwritten or rarely digitized. This work represents a fundamental shift toward audio-native machine intelligence systems, expanding access to language technologies for communities historically left out of the current machine intelligence ecosystem.
Problem

Research questions and friction points this paper is trying to address.

Addressing text bias in AI for audio-literate populations
Developing textless audio-to-AI translation for unwritten languages
Enhancing audio-native machine intelligence without text reliance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Textless audio-to-audio translation framework
Multiscale Audio-Semantic Transform (MAST) representation
Fractional diffusion with fractional Brownian motion
🔎 Similar Papers
No similar papers found.
H
H. Tembine
Issa Bamia
Issa Bamia
African Institute for Mathematical Sciences
AI safety and securityVoice AIAdversarial Machine Learning
M
Massa Ndong
B
Bakary Coulibaly
O
Oumar Issiaka Traore
M
Moussa Traore
M
Moussa Sanogo
M
Mamadou Eric Sangare
S
Salif Kante
D
Daryl Noupa Yongueng
Hafiz Tiomoko Ali
Hafiz Tiomoko Ali
Expedia Group
machine learningrandom matrix theorylearning with little supervision
M
Malik Tiomoko
F
F. Laleye
Boualem Djehiche
Boualem Djehiche
Professor of Mathematical Statistics at KTH Royal Institute of Technology
Stochastic AnalysisInsurance MathematicsFinancial MathematicsMathematical StatisticsOptimal Control
W
Wesmanegda Elisee Dipama
I
Idris Baba Saje
H
Hammid Mohammed Ibrahim
M
Moumini Sanogo
M
Marie Coursel Nininahazwe
A
Abdul-Latif Siita
H
Haine Mhlongo
T
Teddy Nelvy Dieu Merci Kouka
M
Mariam Serine Jeridi
M
M. P. Mupenge
L
Lekoueiry Dehah
A
Abdoul-Aziz Bio Sidi D. Bouko
W
Wilfried Franceslas Zokoue
O
Odette Richette Sambila
A
Alina RS Mbango
M
Mady Diagouraga
O
Oumarou Moussa Sanoussi
G
Gizachew Dessalegn
M
Mohamed Lamine Samoura
B
B. Coulibaly