Learning Hierarchical Knowledge in Text-Rich Networks with Taxonomy-Informed Representation Learning

📅 2026-03-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitation of existing textual rich network (TRN) representation learning methods, which overlook the inherent hierarchical semantic structure in text and thus struggle to effectively model knowledge organization from coarse- to fine-grained levels. To overcome this, the authors propose TIER, a novel approach that introduces an implicit hierarchical taxonomy into TRNs for the first time. TIER jointly learns multi-level semantic representations through similarity-guided contrastive learning, hierarchical K-Means clustering, and a clustering strategy enhanced by large language models. Furthermore, it incorporates a regularization loss based on co-phenotype correlation coefficients to align the embedding space with the underlying hierarchy. Extensive experiments demonstrate that TIER significantly outperforms state-of-the-art methods across multiple cross-domain datasets, yielding representations with improved structural coherence, interpretability, and downstream task performance.

Technology Category

Application Category

📝 Abstract
Hierarchical knowledge structures are ubiquitous across real-world domains and play a vital role in organizing information from coarse to fine semantic levels. While such structures have been widely used in taxonomy systems, biomedical ontologies, and retrieval-augmented generation, their potential remains underexplored in the context of Text-Rich Networks (TRNs), where each node contains rich textual content and edges encode semantic relationships. Existing methods for learning on TRNs often focus on flat semantic modeling, overlooking the inherent hierarchical semantics embedded in textual documents. To this end, we propose TIER (Hierarchical \textbf{T}axonomy-\textbf{I}nformed R\textbf{E}presentation Learning on Text-\textbf{R}ich Networks), which first constructs an implicit hierarchical taxonomy and then integrates it into the learned node representations. Specifically, TIER employs similarity-guided contrastive learning to build a clustering-friendly embedding space, upon which it performs hierarchical K-Means followed by LLM-powered clustering refinement to enable semantically coherent taxonomy construction. Leveraging the resulting taxonomy, TIER introduces a cophenetic correlation coefficient-based regularization loss to align the learned embeddings with the hierarchical structure. By learning representations that respect both fine-grained and coarse-grained semantics, TIER enables more interpretable and structured modeling of real-world TRNs. We demonstrate that our approach significantly outperforms existing methods on multiple datasets across diverse domains, highlighting the importance of hierarchical knowledge learning for TRNs.
Problem

Research questions and friction points this paper is trying to address.

Text-Rich Networks
Hierarchical Knowledge
Taxonomy
Semantic Representation
Node Embedding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Taxonomy-Informed Representation Learning
Hierarchical Knowledge
Text-Rich Networks
Contrastive Learning
Cophenetic Regularization
🔎 Similar Papers
No similar papers found.
Yunhui Liu
Yunhui Liu
Nanjing University
Graph Machine Learning
Yongchao Liu
Yongchao Liu
Staff Engineer, Ant Group (China)
Parallel computingMachine learningAlgorithmsBioinformatics
Y
Yinfeng Chen
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
C
Chuntao Hong
Ant Group, Beijing, China
T
Tao Zheng
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
Tieke He
Tieke He
Nanjing University
Knowledge GraphQuestion AnsweringData Quality