THEME : Enhancing Thematic Investing with Semantic Stock Representations and Temporal Dynamics

📅 2025-08-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Thematic investing faces challenges in stock selection due to ambiguous industry boundaries and evolving market dynamics. To address this, we propose THEME, a hierarchical contrastive learning framework that jointly models thematic semantic relationships and stock price time-series dynamics. We introduce the first Thematic Representation Set (TRS) dataset—integrating thematic mutual funds, hierarchical industry taxonomies, and financial news—to enable multi-source alignment. THEME leverages text embeddings, hierarchical contrastive learning across thematic, sectoral, and asset levels, and time-series feature augmentation to align heterogeneous modalities. Empirically, THEME significantly improves retrieval coverage of thematically relevant assets and enhances portfolio returns. It outperforms strong baselines across standard retrieval metrics (e.g., Recall@K, MRR) and demonstrates robustness, effectiveness, and scalability in real-world investment scenarios.

Technology Category

Application Category

📝 Abstract
Thematic investing aims to construct portfolios aligned with structural trends, yet selecting relevant stocks remains challenging due to overlapping sector boundaries and evolving market dynamics. To address this challenge, we construct the Thematic Representation Set (TRS), an extended dataset that begins with real-world thematic ETFs and expands upon them by incorporating industry classifications and financial news to overcome their coverage limitations. The final dataset contains both the explicit mapping of themes to their constituent stocks and the rich textual profiles for each. Building on this dataset, we introduce extsc{THEME}, a hierarchical contrastive learning framework. By representing the textual profiles of themes and stocks as embeddings, extsc{THEME} first leverages their hierarchical relationship to achieve semantic alignment. Subsequently, it refines these semantic embeddings through a temporal refinement stage that incorporates individual stock returns. The final stock representations are designed for effective retrieval of thematically aligned assets with strong return potential. Empirical results show that extsc{THEME} outperforms strong baselines across multiple retrieval metrics and significantly improves performance in portfolio construction. By jointly modeling thematic relationships from text and market dynamics from returns, extsc{THEME} provides a scalable and adaptive solution for navigating complex investment themes.
Problem

Research questions and friction points this paper is trying to address.

Selecting relevant stocks for thematic investing is challenging
Overcoming coverage limitations of thematic ETFs with expanded data
Modeling thematic relationships and market dynamics for better portfolios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical contrastive learning for semantic alignment
Temporal refinement incorporating stock returns
Semantic embeddings from thematic and stock profiles
🔎 Similar Papers
No similar papers found.
Hoyoung Lee
Hoyoung Lee
Ulsan National Institute of Science and Technology (UNIST)
AI in FinanceFinancial NLPTrustworthy AILarge Language Models
W
Wonbin Ahn
LG AI Research, Seoul, Republic of Korea
S
Suhwan Park
Ulsan National Institute of Science and Technology, Ulsan, Republic of Korea
J
Jaehoon Lee
LG AI Research, Seoul, Republic of Korea
M
Minjae Kim
LG AI Research, Seoul, Republic of Korea
S
Sungdong Yoo
LG AI Research, Seoul, Republic of Korea
T
Taeyoon Lim
LG AI Research, Seoul, Republic of Korea
Woohyung Lim
Woohyung Lim
LG AI Research
Deep LearningRepresentation LearningAnomaly DetectionTime-series Forecasting
Y
Yongjae Lee
Ulsan National Institute of Science and Technology, Ulsan, Republic of Korea