MGTEVAL: An Interactive Platform for Systemtic Evaluation of Machine-Generated Text Detectors

📅 2026-04-27
📈 Citations: 0
Influential: 0
📄 PDF

career value

191K/year
🤖 AI Summary
This work addresses the lack of standardized evaluation protocols for machine-generated text detectors, which hinders result comparability and reproducibility. To this end, we propose an extensible, standardized benchmarking platform integrating four core modules: dataset construction, textual adversarial attacks, detector training, and multidimensional performance evaluation. The platform supports twelve attack methods, multiple state-of-the-art detection algorithms, and configurable large language model–generated text, offering both command-line and web-based interfaces. Users can flexibly construct custom evaluation benchmarks without modifying code, significantly enhancing comparability, reproducibility, and usability. To our knowledge, this is the first effort to systematize and standardize the entire evaluation pipeline for generated text detection.
📝 Abstract
We present MGTEVAL, an extensible platform for systematic evaluation of Machine-Generated Text (MGT) detectors. Despite rapid progress in MGT detection, existing evaluations are often fragmented across datasets, preprocessing, attacks, and metrics, making results hard to compare and reproduce. MGTEVAL organizes the workflow into four components: Dataset Building, Dataset Attack, Detector Training, and Performance Evaluation. It supports constructing custom benchmarks by generating MGT with configurable LLMs, applying 12 text attacks to test sets, training detectors via a unified interface, and reporting effectiveness, robustness, and efficiency. The platform provides both command-line and Web-based interfaces for user-friendly experimentation without code rewriting.
Problem

Research questions and friction points this paper is trying to address.

Machine-Generated Text Detection
Evaluation Benchmark
Reproducibility
Robustness Evaluation
Systematic Evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

machine-generated text detection
systematic evaluation
text attack robustness
extensible evaluation platform
benchmark construction
🔎 Similar Papers
No similar papers found.