Coupling Local Context and Global Semantic Prototypes via a Hierarchical Architecture for Rhetorical Roles Labeling

📅 2026-03-04

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

This work addresses the challenge that existing hierarchical models struggle to capture corpus-level global semantics in rhetorical role labeling, particularly suffering from poor recognition of infrequent roles in specialized domains such as legal and medical texts. To overcome this limitation, the authors propose two semantic prototype-based approaches: one employing prototype regularization to construct a structured latent space, and another introducing a prototype-conditioned modulation mechanism that integrates local contextual cues with global semantic information during both training and inference. This study is the first to incorporate corpus-level semantic prototypes into a hierarchical architecture, jointly modeling local dependencies and global characteristics. Additionally, the authors introduce SCOTUS-Law, the first three-granularity annotated dataset of U.S. Supreme Court cases. The proposed methods consistently outperform strong baselines across legal, medical, and scientific benchmarks, achieving up to a 4-point Macro-F1 improvement on infrequent roles, with effectiveness validated through expert evaluation.

Technology Category

Application Category

📝 Abstract

Rhetorical Role Labeling (RRL) identifies the functional role of each sentence in a document, a key task for discourse understanding in domains such as law and medicine. While hierarchical models capture local dependencies effectively, they are limited in modeling global, corpus-level features. To address this limitation, we propose two prototype-based methods that integrate local context with global representations. Prototype-Based Regularization (PBR) learns soft prototypes through a distance-based auxiliary loss to structure the latent space, while Prototype-Conditioned Modulation (PCM) constructs corpus-level prototypes and injects them during training and inference. Given the scarcity of RRL resources, we introduce SCOTUS-Law, the first dataset of U.S. Supreme Court opinions annotated with rhetorical roles at three levels of granularity: category, rhetorical function, and step. Experiments on legal, medical, and scientific benchmarks show consistent improvements over strong baselines, with 4 Macro-F1 gains on low-frequency roles. We further analyze the implications in the era of Large Language Models and complement our findings with expert evaluation.

Problem

Research questions and friction points this paper is trying to address.

Rhetorical Role Labeling

local context

global semantics

hierarchical modeling

discourse understanding

Innovation

Methods, ideas, or system contributions that make the work stand out.

Prototype-Based Regularization

Prototype-Conditioned Modulation

Rhetorical Role Labeling