🤖 AI Summary
This work addresses the challenge that existing hierarchical models struggle to capture corpus-level global semantics in rhetorical role labeling, particularly suffering from poor recognition of infrequent roles in specialized domains such as legal and medical texts. To overcome this limitation, the authors propose two semantic prototype-based approaches: one employing prototype regularization to construct a structured latent space, and another introducing a prototype-conditioned modulation mechanism that integrates local contextual cues with global semantic information during both training and inference. This study is the first to incorporate corpus-level semantic prototypes into a hierarchical architecture, jointly modeling local dependencies and global characteristics. Additionally, the authors introduce SCOTUS-Law, the first three-granularity annotated dataset of U.S. Supreme Court cases. The proposed methods consistently outperform strong baselines across legal, medical, and scientific benchmarks, achieving up to a 4-point Macro-F1 improvement on infrequent roles, with effectiveness validated through expert evaluation.
📝 Abstract
Rhetorical Role Labeling (RRL) identifies the functional role of each sentence in a document, a key task for discourse understanding in domains such as law and medicine. While hierarchical models capture local dependencies effectively, they are limited in modeling global, corpus-level features. To address this limitation, we propose two prototype-based methods that integrate local context with global representations. Prototype-Based Regularization (PBR) learns soft prototypes through a distance-based auxiliary loss to structure the latent space, while Prototype-Conditioned Modulation (PCM) constructs corpus-level prototypes and injects them during training and inference. Given the scarcity of RRL resources, we introduce SCOTUS-Law, the first dataset of U.S. Supreme Court opinions annotated with rhetorical roles at three levels of granularity: category, rhetorical function, and step. Experiments on legal, medical, and scientific benchmarks show consistent improvements over strong baselines, with 4 Macro-F1 gains on low-frequency roles. We further analyze the implications in the era of Large Language Models and complement our findings with expert evaluation.