MMTL-UniAD: A Unified Framework for Multimodal and Multi-Task Learning in Assistive Driving Perception

📅 2025-04-03

📈 Citations: 0

✨ Influential: 0

career value

236K/year

🤖 AI Summary

Current ADAS systems lack joint modeling of driver psychophysiological states and traffic environments, and suffer from negative transfer in multi-task learning. This paper proposes a unified multi-task framework for assisted-driving perception, simultaneously recognizing driver behavior (e.g., head scanning), affective states (e.g., anxiety), vehicle maneuvers (e.g., turning), and traffic scenes (e.g., congestion). We innovatively design a multi-axis regional attention network and a dual-branch multimodal embedding mechanism to adaptively decouple task-relevant and task-irrelevant features within shared representations. By integrating cross-modal alignment and joint optimization, our approach effectively mitigates negative transfer. On the AIDE benchmark, our method achieves state-of-the-art performance across all four tasks. Ablation studies validate the efficacy of each component, and the source code is publicly released.

Technology Category

Application Category

📝 Abstract

Advanced driver assistance systems require a comprehensive understanding of the driver's mental/physical state and traffic context but existing works often neglect the potential benefits of joint learning between these tasks. This paper proposes MMTL-UniAD, a unified multi-modal multi-task learning framework that simultaneously recognizes driver behavior (e.g., looking around, talking), driver emotion (e.g., anxiety, happiness), vehicle behavior (e.g., parking, turning), and traffic context (e.g., traffic jam, traffic smooth). A key challenge is avoiding negative transfer between tasks, which can impair learning performance. To address this, we introduce two key components into the framework: one is the multi-axis region attention network to extract global context-sensitive features, and the other is the dual-branch multimodal embedding to learn multimodal embeddings from both task-shared and task-specific features. The former uses a multi-attention mechanism to extract task-relevant features, mitigating negative transfer caused by task-unrelated features. The latter employs a dual-branch structure to adaptively adjust task-shared and task-specific parameters, enhancing cross-task knowledge transfer while reducing task conflicts. We assess MMTL-UniAD on the AIDE dataset, using a series of ablation studies, and show that it outperforms state-of-the-art methods across all four tasks. The code is available on https://github.com/Wenzhuo-Liu/MMTL-UniAD.

Problem

Research questions and friction points this paper is trying to address.

Unified framework for multimodal driver and traffic analysis

Avoiding negative transfer between multiple learning tasks

Enhancing cross-task knowledge with dual-branch embeddings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-axis attention for global context features

Dual-branch multimodal embedding structure

Joint learning of driver and traffic tasks

🔎 Similar Papers

No similar papers found.