🤖 AI Summary
Virtual brainstorming generates vast, sparse, and noisy idea corpora, rendering manual coding inefficient and highly subjective—necessitating automated creative idea assessment. To address this, we propose a semantic-driven topic modeling framework: it encodes sentences into embeddings using Sentence-BERT, applies UMAP for dimensionality reduction, performs density-based clustering via HDBSCAN, and incorporates an optimized theme refinement module to enable end-to-end sentence-level semantic topic discovery. The framework exhibits strong robustness to noise and intrinsic anomaly detection capability. Quantitatively, it achieves superior topic coherence (mean CV score = 0.687) and interpretability compared to LDA, ETM, and BERTopic. Evaluated on Zoom-based group discussion data, it effectively captures both the divergent and convergent dynamics of creativity in virtual collaboration, thereby establishing a novel paradigm for assessing the depth and diversity of collective creativity.
📝 Abstract
Virtual brainstorming sessions have become a central component of collaborative problem solving, yet the large volume and uneven distribution of ideas often make it difficult to extract valuable insights efficiently. Manual coding of ideas is time-consuming and subjective, underscoring the need for automated approaches to support the evaluation of group creativity. In this study, we propose a semantic-driven topic modeling framework that integrates four modular components: transformer-based embeddings (Sentence-BERT), dimensionality reduction (UMAP), clustering (HDBSCAN), and topic extraction with refinement. The framework captures semantic similarity at the sentence level, enabling the discovery of coherent themes from brainstorming transcripts while filtering noise and identifying outliers. We evaluate our approach on structured Zoom brainstorming sessions involving student groups tasked with improving their university. Results demonstrate that our model achieves higher topic coherence compared to established methods such as LDA, ETM, and BERTopic, with an average coherence score of 0.687 (CV), outperforming baselines by a significant margin. Beyond improved performance, the model provides interpretable insights into the depth and diversity of topics explored, supporting both convergent and divergent dimensions of group creativity. This work highlights the potential of embedding-based topic modeling for analyzing collaborative ideation and contributes an efficient and scalable framework for studying creativity in synchronous virtual meetings.