Face-voice Association in Multilingual Environments (FAME) 2026 Challenge Evaluation Plan

📅 2025-08-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the underexplored problem of cross-modal face–voice association modeling in multilingual settings. Methodologically, we propose the first framework tailored to bilingual/multilingual real-world communication scenarios, built upon our newly constructed multilingual audio-visual dataset, MAV-Celeb. Our approach introduces an audio–visual joint representation learning architecture that integrates deep cross-modal matching with language-aware feature alignment. Key contributions include: (1) the first multilingual benchmark for face–voice association—MultiLingual-FaceVoice Benchmark; (2) the release of MAV-Celeb, a high-quality, multilingual dataset annotated with speaker identity and language labels; and (3) reproducible strong baseline models that significantly improve robustness of cross-modal matching under cross-lingual conditions. This work establishes a new paradigm and provides empirical foundations for generalizable cross-modal biometric recognition.

Technology Category

Application Category

📝 Abstract
The advancements of technology have led to the use of multimodal systems in various real-world applications. Among them, audio-visual systems are among the most widely used multimodal systems. In the recent years, associating face and voice of a person has gained attention due to the presence of unique correlation between them. The Face-voice Association in Multilingual Environments (FAME) 2026 Challenge focuses on exploring face-voice association under the unique condition of a multilingual scenario. This condition is inspired from the fact that half of the world's population is bilingual and most often people communicate under multilingual scenarios. The challenge uses a dataset named Multilingual Audio-Visual (MAV-Celeb) for exploring face-voice association in multilingual environments. This report provides the details of the challenge, dataset, baseline models, and task details for the FAME Challenge.
Problem

Research questions and friction points this paper is trying to address.

Exploring face-voice association in multilingual environments
Using multimodal systems for audio-visual correlation studies
Addressing challenges in bilingual and multilingual communication scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal face-voice association technology
Multilingual scenario dataset MAV-Celeb
Baseline models for audio-visual systems
🔎 Similar Papers
No similar papers found.