Omni-CLST: Error-aware Curriculum Learning with guided Selective chain-of-Thought for audio questuin answering

📅 2025-09-14

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

This work addresses the weak reasoning capability and poor generalization of models in Audio Question Answering (Audio QA). We propose an error-aware curriculum learning framework integrating difficulty-aware sample ranking, guided selective chain-of-thought (CoT), and GRPO-based reinforcement training. Our key contributions are: (1) dynamically constructing a curriculum sequence prioritizing hard examples based on model prediction errors; (2) adaptively pruning redundant reasoning steps during inference to focus on critical acoustic-semantic alignments; and (3) optimizing CoT generation via GRPO to enhance information utilization efficiency. Evaluated on MMAU-mini and MMAR, our method achieves 73.80% and 64.30% accuracy, respectively—setting a new state-of-the-art on MMAR. Results demonstrate significant improvements in both robustness and multimodal reasoning capability.

Technology Category

Application Category

📝 Abstract

We propose Omni-CLST, an error-aware Curriculum Learning framework with guided Selective Chain-of-Thought for audio question answering. The framework efficiently leverages existing high-quality dataset through two key strategies: an error-aware curriculum that organizes samples by difficulty, and a guided thought dropout mechanism that focuses reasoning on challenging cases. Integrated with GRPO training, these strategies enable the model to learn more effectively from informative samples. Experiments on MMAU-mini and MMAR demonstrate that Omni-CLST achieves competitive accuracy (73.80% on MMAU-mini) and establishes a new state of the art (64.30% on MMAR), highlighting its robustness and generalization capability in multimodal audio-language understanding.

Problem

Research questions and friction points this paper is trying to address.

Improving audio question answering accuracy through curriculum learning

Enhancing reasoning on challenging cases with selective chain-of-thought

Advancing multimodal audio-language understanding with error-aware training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Error-aware curriculum learning for difficulty

Guided selective chain-of-thought reasoning

GRPO training integration for effective learning

🔎 Similar Papers

Using Large Multimodal Models to Extract Knowledge Components for Knowledge Tracing from Multimedia Question Information