Team "better_call_claude": Style Change Detection using a Sequential Sentence Pair Classifier

📅 2025-08-01

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

This work addresses fine-grained, sentence-level document style change point detection—specifically identifying subtle stylistic shifts between adjacent sentences, especially in challenging cases involving short sentences or minimal stylistic divergence. We propose a lightweight, efficient sequence-based sentence-pair classification architecture: individual sentences are encoded via a pretrained language model; bidirectional LSTMs capture local contextual dependencies; and concatenated sentence representations are fed into an MLP to predict style discontinuities. This design notably enhances discriminability for “shallow-style” segments. Evaluated on the PAN-2025 official test set, our model achieves macro-F1 scores of 0.923, 0.828, and 0.724 across three task variants—substantially outperforming both random baselines and zero-shot Claude-3.5-Sonnet. Results validate the effectiveness and state-of-the-art capability of context-aware sentence-pair modeling for style change detection.

Technology Category

Application Category

📝 Abstract

Style change detection - identifying the points in a document where writing style shifts - remains one of the most important and challenging problems in computational authorship analysis. At PAN 2025, the shared task challenges participants to detect style switches at the most fine-grained level: individual sentences. The task spans three datasets, each designed with controlled and increasing thematic variety within documents. We propose to address this problem by modeling the content of each problem instance - that is, a series of sentences - as a whole, using a Sequential Sentence Pair Classifier (SSPC). The architecture leverages a pre-trained language model (PLM) to obtain representations of individual sentences, which are then fed into a bidirectional LSTM (BiLSTM) to contextualize them within the document. The BiLSTM-produced vectors of adjacent sentences are concatenated and passed to a multi-layer perceptron for prediction per adjacency. Building on the work of previous PAN participants classical text segmentation, the approach is relatively conservative and lightweight. Nevertheless, it proves effective in leveraging contextual information and addressing what is arguably the most challenging aspect of this year's shared task: the notorious problem of "stylistically shallow", short sentences that are prevalent in the proposed benchmark data. Evaluated on the official PAN-2025 test datasets, the model achieves strong macro-F1 scores of 0.923, 0.828, and 0.724 on the EASY, MEDIUM, and HARD data, respectively, outperforming not only the official random baselines but also a much more challenging one: claude-3.7-sonnet's zero-shot performance.

Problem

Research questions and friction points this paper is trying to address.

Detect style changes between adjacent sentences in documents

Address challenges with short, stylistically shallow sentences

Improve accuracy over baselines in fine-grained style analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sequential Sentence Pair Classifier for style detection

BiLSTM contextualizes pre-trained language model outputs

Multi-layer perceptron predicts style changes per adjacency

🔎 Similar Papers

No similar papers found.