Deciphering the complaint aspects: Towards an aspect-based complaint identification model with video complaint dataset in finance

📅 2025-02-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of fine-grained aspect identification in financial-domain video complaints, this paper introduces the first multimodal dataset for financial video complaints (433 samples). We propose a collaborative multi-task architecture—ISEC—comprising dual frozen CLIP encoders, an image segmentation encoder, and a contextual attention mechanism, enabling joint audio-video-text modeling for multi-label complaint aspect classification and holistic complaint detection. We establish the first multimodal annotation paradigm tailored to financial video complaints, incorporating multimodal feature alignment and an efficient multi-label inference mechanism. Evaluated on our curated dataset, ISEC consistently outperforms state-of-the-art multimodal baselines across all metrics, with significant improvements in precision, recall, and F1-score. This work provides a practical, deployable technical foundation for precise complaint root-cause attribution and proactive intervention in intelligent customer service systems.

Technology Category

Application Category

📝 Abstract
In today's competitive marketing landscape, effective complaint management is crucial for customer service and business success. Video complaints, integrating text and image content, offer invaluable insights by addressing customer grievances and delineating product benefits and drawbacks. However, comprehending nuanced complaint aspects within vast daily multimodal financial data remains a formidable challenge. Addressing this gap, we have curated a proprietary multimodal video complaint dataset comprising 433 publicly accessible instances. Each instance is meticulously annotated at the utterance level, encompassing five distinct categories of financial aspects and their associated complaint labels. To support this endeavour, we introduce Solution 3.0, a model designed for multimodal aspect-based complaint identification task. Solution 3.0 is tailored to perform three key tasks: 1) handling multimodal features ( audio and video), 2) facilitating multilabel aspect classification, and 3) conducting multitasking for aspect classifications and complaint identification parallelly. Solution 3.0 utilizes a CLIP-based dual frozen encoder with an integrated image segment encoder for global feature fusion, enhanced by contextual attention (ISEC) to improve accuracy and efficiency. Our proposed framework surpasses current multimodal baselines, exhibiting superior performance across nearly all metrics by opening new ways to strengthen appropriate customer care initiatives and effectively assisting individuals in resolving their problems.
Problem

Research questions and friction points this paper is trying to address.

Identifies nuanced complaint aspects in financial video data.
Develops a model for multimodal aspect-based complaint identification.
Enhances customer service by analyzing text and image complaints.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal feature handling with audio and video
Multilabel aspect classification for financial data
CLIP-based dual frozen encoder with ISEC enhancement
🔎 Similar Papers
No similar papers found.
Sarmistha Das
Sarmistha Das
Indian Institute Of Technology Patna
MLDLNLPFinTEch
B
Basha Mujavarsheik
Indian Institute of Technology Patna
R
R. E. Z. Lyngkhoi
Indian Institute of Technology Patna
S
Sriparna Saha
Indian Institute of Technology Patna
A
Alka Maurya
Crisil Limited