Coupling Local Context and Global Semantic Prototypes via a Hierarchical Architecture for Rhetorical Roles Labeling

📅 2026-03-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge that existing hierarchical models struggle to capture corpus-level global semantics in rhetorical role labeling, particularly suffering from poor recognition of infrequent roles in specialized domains such as legal and medical texts. To overcome this limitation, the authors propose two semantic prototype-based approaches: one employing prototype regularization to construct a structured latent space, and another introducing a prototype-conditioned modulation mechanism that integrates local contextual cues with global semantic information during both training and inference. This study is the first to incorporate corpus-level semantic prototypes into a hierarchical architecture, jointly modeling local dependencies and global characteristics. Additionally, the authors introduce SCOTUS-Law, the first three-granularity annotated dataset of U.S. Supreme Court cases. The proposed methods consistently outperform strong baselines across legal, medical, and scientific benchmarks, achieving up to a 4-point Macro-F1 improvement on infrequent roles, with effectiveness validated through expert evaluation.

Technology Category

Application Category

📝 Abstract
Rhetorical Role Labeling (RRL) identifies the functional role of each sentence in a document, a key task for discourse understanding in domains such as law and medicine. While hierarchical models capture local dependencies effectively, they are limited in modeling global, corpus-level features. To address this limitation, we propose two prototype-based methods that integrate local context with global representations. Prototype-Based Regularization (PBR) learns soft prototypes through a distance-based auxiliary loss to structure the latent space, while Prototype-Conditioned Modulation (PCM) constructs corpus-level prototypes and injects them during training and inference. Given the scarcity of RRL resources, we introduce SCOTUS-Law, the first dataset of U.S. Supreme Court opinions annotated with rhetorical roles at three levels of granularity: category, rhetorical function, and step. Experiments on legal, medical, and scientific benchmarks show consistent improvements over strong baselines, with 4 Macro-F1 gains on low-frequency roles. We further analyze the implications in the era of Large Language Models and complement our findings with expert evaluation.
Problem

Research questions and friction points this paper is trying to address.

Rhetorical Role Labeling
local context
global semantics
hierarchical modeling
discourse understanding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Prototype-Based Regularization
Prototype-Conditioned Modulation
Rhetorical Role Labeling
Hierarchical Architecture
SCOTUS-Law
🔎 Similar Papers
No similar papers found.
A
Anas Belfathi
Nantes Université, École Centrale Nantes, CNRS, LS2N, UMR 6004, F-44000 Nantes, France
Nicolas Hernandez
Nicolas Hernandez
Université de Nantes - LS2N (UMR 6004)
Natural Langage ProcessingDiscourse Structure Analysis
L
Laura Monceaux
Nantes Université, École Centrale Nantes, CNRS, LS2N, UMR 6004, F-44000 Nantes, France
W
Warren Bonnard
University of Lorraine, France
M
Mary Catherine Lavissiere
Nantes Université, École Centrale Nantes, CNRS, LS2N, UMR 6004, F-44000 Nantes, France
C
Christine Jacquin
Nantes Université, École Centrale Nantes, CNRS, LS2N, UMR 6004, F-44000 Nantes, France
Richard Dufour
Richard Dufour
LS2N - TALN/NLP research group - Nantes University
Natural language processingBiomedical domainLanguage modelingSpontaneous speech