🤖 AI Summary
This work addresses the challenge of detecting generative AI–produced content. We propose an unsupervised, interpretable embedding-space analysis method: semantic embeddings of text or images are extracted using pre-trained large language or multimodal models; subsequently, dimensionality reduction (e.g., PCA) uncovers an intrinsic, low-dimensional distributional shift between AI-generated and human-created samples—rendering them highly separable without supervision. This phenomenon is systematically validated for the first time and endowed with human-interpretable semantic meaning (e.g., topic coherence, syntactic redundancy). Experiments across diverse generative models—including ChatGPT, Gemini, and Stable Diffusion—demonstrate that high-accuracy separation is achieved solely from raw embeddings and unsupervised projection, without fine-tuning, labeled data, or model-specific detectors. Our approach thus significantly enhances both generalizability and interpretability of AI-content detection.
📝 Abstract
Constructing high-quality features is critical to any quantitative data analysis. While feature engineering was historically addressed by carefully hand-crafting data representations based on domain expertise, deep neural networks (DNNs) now offer a radically different approach. DNNs implicitly engineer features by transforming their input data into hidden feature vectors called embeddings. For embedding vectors produced by foundation models -- which are trained to be useful across many contexts -- we demonstrate that simple and well-studied dimensionality-reduction techniques such as Principal Component Analysis uncover inherent heterogeneity in input data concordant with human-understandable explanations. Of the many applications for this framework, we find empirical evidence that there is intrinsic separability between real samples and those generated by artificial intelligence (AI).