DFKI-MLT at SemEval-2026 TASK 7: Steering Multilingual Models Towards Cultural Knowledge

📅 2026-05-21

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This work addresses the uneven representation of cross-lingual cultural knowledge and the lack of consistent cultural awareness in multilingual large language models. The authors propose a parameter-free activation steering method that dynamically enhances cultural sensitivity during inference by injecting language vectors—extracted from the FLORES parallel corpus—into the model’s residual stream, combined with language-specific prompt design. Experimental results demonstrate that the effectiveness of this steering is highly dependent on the Transformer layer, language–region pair, and prompt formulation. On the SemEval-2026 Task 7 multiple-choice track, the approach achieves an accuracy of 86.96%, ranking 7th out of 17 participating teams, thereby revealing significant heterogeneity and layer sensitivity in cultural steering.

📝 Abstract

Large language models (LLMs) are increasingly used across diverse linguistic and cultural contexts, yet their cultural knowledge remains uneven across regions and languages. We present the DFKI-MLT system for SemEval-2026 Task 7 on cultural awareness, where we apply activation steering to multilingual LLMs using language vectors extracted from parallel FLORES data. Our method performs inference-time adaptation by adding language-specific steering vectors to the residual stream at a selected transformer layer, without any parameter updates. We participated in both the short-answer (SAQ) and multiple-choice (MCQ) tracks; however, only our MCQ submission received an official score. In the official MCQ track, we achieved 86.96% accuracy, ranking 7th out of 17 teams. To better understand system behavior, we conduct post-hoc analyses on the shared-task MCQ and SAQ settings. These analyses show that activation steering yields modest and heterogeneous improvements on cultural reasoning: gains are strongly layer-sensitive, vary substantially across language-region pairs, with some configurations even degrading performance, and interact with prompt formulation, comparing generic and culturally conditioned prompts. Our findings suggest that prompt design and activation steering should be jointly optimized for culturally aware multilingual inference.

Problem

Research questions and friction points this paper is trying to address.

cultural knowledge

multilingual models

language models

cultural awareness

cross-lingual generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

activation steering

multilingual LLMs

cultural awareness