Measuring AI"Slop"in Text

📅 2025-09-23

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

The absence of a precise definition and quantifiable metrics for “slop”—low-quality, vacuous, or mechanically generated text in AI outputs—hampers rigorous evaluation and mitigation. Method: We conducted iterative expert interviews to develop the first systematic taxonomy of slop, then designed an interpretable evaluation framework grounded in coherence, relevance, and information density. Using fine-grained, fragment-level human annotations, we analyzed correlations between subjective slop judgments and linguistic features. Contribution/Results: We find that binary “slop” classification relies significantly on latent semantic and structural cues—not surface-level form—revealing deep mismatches between human quality perception and conventional NLP metrics. Our framework bridges theoretical interpretability and practical deployability, enabling applications in AI-generated content detection, preference modeling for LLM alignment, and human-AI collaborative editing. By integrating computational linguistics with human-centered quality criteria, it establishes a novel paradigm for holistic, value-aware text assessment.

Technology Category

Application Category

📝 Abstract

AI"slop"is an increasingly popular term used to describe low-quality AI-generated text, but there is currently no agreed upon definition of this term nor a means to measure its occurrence. In this work, we develop a taxonomy of"slop"through interviews with experts in NLP, writing, and philosophy, and propose a set of interpretable dimensions for its assessment in text. Through span-level annotation, we find that binary"slop"judgments are (somewhat) subjective, but such determinations nonetheless correlate with latent dimensions such as coherence and relevance. Our framework can be used to evaluate AI-generated text in both detection and binary preference tasks, potentially offering new insights into the linguistic and stylistic factors that contribute to quality judgments.

Problem

Research questions and friction points this paper is trying to address.

Defining and measuring low-quality AI-generated text called 'slop'

Developing a taxonomy through expert interviews across multiple disciplines

Creating interpretable dimensions to assess AI text quality objectively

Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed taxonomy through expert interviews

Proposed interpretable dimensions for assessment

Framework enables detection and preference evaluation

🔎 Similar Papers

Detecting AI-Generated Text: Factors Influencing Detectability with Current Methods