🤖 AI Summary
This study addresses the limited interactivity and domain adaptability of existing clustering methods for digital humanities scholars working with large-scale unstructured documents. To bridge this gap, the authors propose an analysis-perspective-driven interactive document clustering framework. This framework enables users to define initial semantic lenses through prompt rewriting and instruction embedding, and integrates interactive visualization, on-the-fly cluster adjustment, and online fine-tuning of embedding models into a closed-loop human-in-the-loop feedback process. The approach supports an interpretable, intervenable, and iterative clustering experience, empowering researchers to efficiently uncover latent semantic structures—such as thematic patterns or sentiment signals—and thereby generate high-quality structured data to support in-depth humanities inquiry.
📝 Abstract
This paper introduces Perspectives, an interactive extension of the Discourse Analysis Tool Suite designed to empower Digital Humanities (DH) scholars to explore and organize large, unstructured document collections. Perspectives implements a flexible, aspect-focused document clustering pipeline with human-in-the-loop refinement capabilities. We showcase how this process can be initially steered by defining analytical lenses through document rewriting prompts and instruction-based embeddings, and further aligned with user intent through tools for refining clusters and mechanisms for fine-tuning the embedding model. The demonstration highlights a typical workflow, illustrating how DH researchers can leverage Perspectives's interactive document map to uncover topics, sentiments, or other relevant categories, thereby gaining insights and preparing their data for subsequent in-depth analysis.