🤖 AI Summary
Existing approaches to literary graph representation often neglect the textual context of character interactions. This work proposes the Dynamic Heterogeneous Character Network (DHCN), which partitions a novel into temporally aligned local heterogeneous graphs, and introduces GraphLit—a self-supervised graph encoding framework tailored for literary analysis—that jointly models characters and their narrative context for the first time. By integrating dynamic heterogeneous graph construction with a masked graph autoencoder, the method substantially outperforms text-only or graph-only baselines across twelve character-centric tasks, particularly excelling in those requiring contextual understanding. Furthermore, it uncovers an intrinsic relationship between narrative nonlinearity and dynamic social structures within literary texts.
📝 Abstract
Methods to represent literary texts as graphs or sequences of graphs mainly focus on representing character interactions, and often overlook another crucial aspect: the textual context in which characters interact. We introduce Dynamic Heterogeneous Character Networks (DHCNs), which organize long novels into temporally localized heterogeneous graphs that align characters with their textual contexts. We extract around 20,000 DHCNs from Project Gutenberg, and propose GraphLit, a self-supervised learning framework that learns rich literary representations through a masked graph autoencoder objective. Across a wide-range of 12 character-related tasks, GraphLit improves over text-only and graph-only baselines, particularly on tasks requiring contextual understanding. Finally, we demonstrate the applicability of DHCNs and GraphLit for literary analysis by studying the link between narrative non-linearity and dynamic social features.