🤖 AI Summary
This study investigates human perception of anthropomorphism in AI-generated music (AIM) and the accuracy of distinguishing AIM from human-composed music. Using a double-blind Turing test paradigm, it employs ecologically valid, non-author-controlled audio stimuli generated by commercial AI models (e.g., Suno), combined with a randomized crossover controlled trial and a mixed-methods approach—integrating behavioral identification performance with qualitative acoustic and technical coding. Its key contributions are: (1) leveraging real-world AI music outputs; (2) implementing a matched-pair design to ensure comparability between AI and human stimuli; and (3) jointly analyzing quantitative identification accuracy and qualitative judgment rationales. Results reveal a counterintuitive inverse relationship: higher identification accuracy occurs when AI and human music exhibit greater acoustic similarity—particularly in vocal naturalness and technical imperfections. Listeners’ judgments predominantly rely on vocal expressivity and production flaws, identifying these as critical acoustic cues underlying anthropomorphic perception.
📝 Abstract
Recent advances in AI music (AIM) generation services are currently transforming the music industry. Given these advances, understanding how humans perceive AIM is crucial both to educate users on identifying AIM songs, and, conversely, to improve current models. We present results from a listener-focused experiment aimed at understanding how humans perceive AIM. In a blind, Turing-like test, participants were asked to distinguish, from a pair, the AIM and human-made song. We contrast with other studies by utilizing a randomized controlled crossover trial that controls for pairwise similarity and allows for a causal interpretation. We are also the first study to employ a novel, author-uncontrolled dataset of AIM songs from real-world usage of commercial models (i.e., Suno). We establish that listeners' reliability in distinguishing AIM causally increases when pairs are similar. Lastly, we conduct a mixed-methods content analysis of listeners' free-form feedback, revealing a focus on vocal and technical cues in their judgments.