🤖 AI Summary
The absence of a precise definition and quantifiable metrics for “slop”—low-quality, vacuous, or mechanically generated text in AI outputs—hampers rigorous evaluation and mitigation.
Method: We conducted iterative expert interviews to develop the first systematic taxonomy of slop, then designed an interpretable evaluation framework grounded in coherence, relevance, and information density. Using fine-grained, fragment-level human annotations, we analyzed correlations between subjective slop judgments and linguistic features.
Contribution/Results: We find that binary “slop” classification relies significantly on latent semantic and structural cues—not surface-level form—revealing deep mismatches between human quality perception and conventional NLP metrics. Our framework bridges theoretical interpretability and practical deployability, enabling applications in AI-generated content detection, preference modeling for LLM alignment, and human-AI collaborative editing. By integrating computational linguistics with human-centered quality criteria, it establishes a novel paradigm for holistic, value-aware text assessment.
📝 Abstract
AI"slop"is an increasingly popular term used to describe low-quality AI-generated text, but there is currently no agreed upon definition of this term nor a means to measure its occurrence. In this work, we develop a taxonomy of"slop"through interviews with experts in NLP, writing, and philosophy, and propose a set of interpretable dimensions for its assessment in text. Through span-level annotation, we find that binary"slop"judgments are (somewhat) subjective, but such determinations nonetheless correlate with latent dimensions such as coherence and relevance. Our framework can be used to evaluate AI-generated text in both detection and binary preference tasks, potentially offering new insights into the linguistic and stylistic factors that contribute to quality judgments.