ProtoSiTex: Learning Semi-Interpretable Prototypes for Multi-label Text Classification

📅 2025-10-14

📈 Citations: 0

✨ Influential: 0

career value

152K/year

🤖 AI Summary

Existing prototype-based methods are largely confined to coarse-grained (sentence- or document-level) single-label classification, failing to capture semantic overlap and conflict inherent in multi-label text. To address this, we propose ProtoSiTex—a semi-interpretable prototype framework for multi-label text classification—featuring the first clause-level fine-grained prototype learning and human-aligned explanation generation. Methodologically, ProtoSiTex introduces adaptive class prototypes and a multi-head attention mechanism to model intra- and inter-label semantic interactions; it employs a two-stage alternating training strategy coupled with a cross-level consistency loss to align clause- and document-level representations. Furthermore, we construct the first clause-level annotated benchmark dataset for hotel reviews. Extensive experiments on multiple public benchmarks and our new dataset demonstrate that ProtoSiTex significantly outperforms state-of-the-art prototype methods, achieving superior performance in both classification accuracy and explanation fidelity and comprehensibility.

Technology Category

Application Category

📝 Abstract

The surge in user-generated reviews has amplified the need for interpretable models that can provide fine-grained insights. Existing prototype-based models offer intuitive explanations but typically operate at coarse granularity (sentence or document level) and fail to address the multi-label nature of real-world text classification. We propose ProtoSiTex, a semi-interpretable framework designed for fine-grained multi-label text classification. ProtoSiTex employs a dual-phase alternating training strategy: an unsupervised prototype discovery phase that learns semantically coherent and diverse prototypes, and a supervised classification phase that maps these prototypes to class labels. A hierarchical loss function enforces consistency across sub-sentence, sentence, and document levels, enhancing interpretability and alignment. Unlike prior approaches, ProtoSiTex captures overlapping and conflicting semantics using adaptive prototypes and multi-head attention. We also introduce a benchmark dataset of hotel reviews annotated at the sub-sentence level with multiple labels. Experiments on this dataset and two public benchmarks (binary and multi-class) show that ProtoSiTex achieves state-of-the-art performance while delivering faithful, human-aligned explanations, establishing it as a robust solution for semi-interpretable multi-label text classification.

Problem

Research questions and friction points this paper is trying to address.

Addresses multi-label text classification with interpretable fine-grained insights

Learns semantically coherent prototypes for overlapping semantic relationships

Provides hierarchical consistency across sub-sentence, sentence and document levels

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-phase alternating training for prototype learning

Hierarchical loss ensures multi-level interpretability alignment

Adaptive prototypes capture overlapping semantic relationships

🔎 Similar Papers

Self-supervised Interpretable Concept-based Models for Text Classification