DFKI-MLT at SemEval-2026 TASK 7: Steering Multilingual Models Towards Cultural Knowledge

📅 2026-05-21
📈 Citations: 0
Influential: 0
📄 PDF

career value

185K/year
🤖 AI Summary
This work addresses the uneven representation of cross-lingual cultural knowledge and the lack of consistent cultural awareness in multilingual large language models. The authors propose a parameter-free activation steering method that dynamically enhances cultural sensitivity during inference by injecting language vectors—extracted from the FLORES parallel corpus—into the model’s residual stream, combined with language-specific prompt design. Experimental results demonstrate that the effectiveness of this steering is highly dependent on the Transformer layer, language–region pair, and prompt formulation. On the SemEval-2026 Task 7 multiple-choice track, the approach achieves an accuracy of 86.96%, ranking 7th out of 17 participating teams, thereby revealing significant heterogeneity and layer sensitivity in cultural steering.
📝 Abstract
Large language models (LLMs) are increasingly used across diverse linguistic and cultural contexts, yet their cultural knowledge remains uneven across regions and languages. We present the DFKI-MLT system for SemEval-2026 Task 7 on cultural awareness, where we apply activation steering to multilingual LLMs using language vectors extracted from parallel FLORES data. Our method performs inference-time adaptation by adding language-specific steering vectors to the residual stream at a selected transformer layer, without any parameter updates. We participated in both the short-answer (SAQ) and multiple-choice (MCQ) tracks; however, only our MCQ submission received an official score. In the official MCQ track, we achieved 86.96% accuracy, ranking 7th out of 17 teams. To better understand system behavior, we conduct post-hoc analyses on the shared-task MCQ and SAQ settings. These analyses show that activation steering yields modest and heterogeneous improvements on cultural reasoning: gains are strongly layer-sensitive, vary substantially across language-region pairs, with some configurations even degrading performance, and interact with prompt formulation, comparing generic and culturally conditioned prompts. Our findings suggest that prompt design and activation steering should be jointly optimized for culturally aware multilingual inference.
Problem

Research questions and friction points this paper is trying to address.

cultural knowledge
multilingual models
language models
cultural awareness
cross-lingual generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

activation steering
multilingual LLMs
cultural awareness
inference-time adaptation
language vectors