Toward Interpretable Evaluation Measures for Time Series Segmentation

📅 2025-10-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Time-series segmentation evaluation has long been hindered by conventional metrics (e.g., ARI), which solely consider change-point localization, ignore error semantics, and lack interpretability. To address these limitations, we propose a fine-grained evaluation framework featuring two novel metrics: the Weighted Adjusted Rand Index (WARI) and the State Matching Score (SMS). WARI incorporates position-sensitive weighting to reflect the practical impact of localization errors, while SMS decomposes segmentation errors into four fundamental types—insertion, deletion, merging, and splitting—and supports configurable weighting for domain-specific prioritization. Extensive experiments on synthetic and real-world datasets demonstrate that our metrics not only yield more accurate and discriminative quantification of segmentation quality but also uncover previously indistinguishable error sources, type distributions, and model deficiencies. This significantly enhances both the discriminative power and interpretability of segmentation evaluation.

Technology Category

Application Category

📝 Abstract
Time series segmentation is a fundamental task in analyzing temporal data across various domains, from human activity recognition to energy monitoring. While numerous state-of-the-art methods have been developed to tackle this problem, the evaluation of their performance remains critically limited. Existing measures predominantly focus on change point accuracy or rely on point-based measures such as Adjusted Rand Index (ARI), which fail to capture the quality of the detected segments, ignore the nature of errors, and offer limited interpretability. In this paper, we address these shortcomings by introducing two novel evaluation measures: WARI (Weighted Adjusted Rand Index), that accounts for the position of segmentation errors, and SMS (State Matching Score), a fine-grained measure that identifies and scores four fundamental types of segmentation errors while allowing error-specific weighting. We empirically validate WARI and SMS on synthetic and real-world benchmarks, showing that they not only provide a more accurate assessment of segmentation quality but also uncover insights, such as error provenance and type, that are inaccessible with traditional measures.
Problem

Research questions and friction points this paper is trying to address.

Existing measures fail to capture segmentation error quality
Current evaluation ignores error nature and interpretability
Novel measures address error types and weighting
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces WARI for weighted segmentation error evaluation
Proposes SMS for fine-grained error type identification
Validates measures on synthetic and real-world benchmarks
🔎 Similar Papers
No similar papers found.