FinTexTS: Financial Text-Paired Time-Series Dataset via Semantic-Based and Multi-Level Pairing

📅 2026-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the limitations of existing methods for aligning financial text with time-series data, which predominantly rely on keyword matching and fail to capture the complex, multi-level influences—spanning macroeconomic conditions, industry dynamics, peer companies, and the target firm itself—on stock prices. To overcome this, the authors propose a novel pairing framework that integrates semantic matching with a four-tier news classification scheme. Specifically, they extract contextual information about target firms from SEC filings, retrieve semantically relevant news articles using embedding-based retrieval, and employ a large language model to categorize each article into one of four influence levels. The resulting high-quality dataset, FinTexTS, enables significantly improved stock price prediction performance, particularly when augmented with proprietary news sources, marking the first successful implementation of a semantics-driven, multi-level alignment between financial text and time-series data.

Technology Category

Application Category

📝 Abstract
The financial domain involves a variety of important time-series problems. Recently, time-series analysis methods that jointly leverage textual and numerical information have gained increasing attention. Accordingly, numerous efforts have been made to construct text-paired time-series datasets in the financial domain. However, financial markets are characterized by complex interdependencies, in which a company's stock price is influenced not only by company-specific events but also by events in other companies and broader macroeconomic factors. Existing approaches that pair text with financial time-series data based on simple keyword matching often fail to capture such complex relationships. To address this limitation, we propose a semantic-based and multi-level pairing framework. Specifically, we extract company-specific context for the target company from SEC filings and apply an embedding-based matching mechanism to retrieve semantically relevant news articles based on this context. Furthermore, we classify news articles into four levels (macro-level, sector-level, related company-level, and target-company level) using large language models (LLMs), enabling multi-level pairing of news articles with the target company. Applying this framework to publicly-available news datasets, we construct \textbf{FinTexTS}, a new large-scale text-paired stock price dataset. Experimental results on \textbf{FinTexTS} demonstrate the effectiveness of our semantic-based and multi-level pairing strategy in stock price forecasting. In addition to publicly-available news underlying \textbf{FinTexTS}, we show that applying our method to proprietary yet carefully curated news sources leads to higher-quality paired data and improved stock price forecasting performance.
Problem

Research questions and friction points this paper is trying to address.

financial time-series
text-paired data
semantic matching
market interdependencies
stock price forecasting
Innovation

Methods, ideas, or system contributions that make the work stand out.

semantic-based pairing
multi-level pairing
financial time-series
large language models
text-paired dataset
🔎 Similar Papers
No similar papers found.
J
Jaehoon Lee
LG AI Research & Ulsan National Institute of Science and Technology (UNIST)
S
Suhwan Park
LG AI Research & Ulsan National Institute of Science and Technology (UNIST)
T
Tae Yoon Lim
LG AI Research & Ulsan National Institute of Science and Technology (UNIST)
Seunghan Lee
Seunghan Lee
Yonsei University
Deep LearningMachine Learning
J
Jun Seo
LG AI Research & Ulsan National Institute of Science and Technology (UNIST)
D
Dongwan Kang
LG AI Research & Ulsan National Institute of Science and Technology (UNIST)
H
Hwanil Choi
LG AI Research & Ulsan National Institute of Science and Technology (UNIST)
M
Minjae Kim
LG AI Research & Ulsan National Institute of Science and Technology (UNIST)
S
Sungdong Yoo
LG AI Research & Ulsan National Institute of Science and Technology (UNIST)
S
SoonYoung Lee
LG AI Research & Ulsan National Institute of Science and Technology (UNIST)
Yongjae Lee
Yongjae Lee
Associate Professor, UNIST IE & AIGS
Financial EngineeringPortfolio OptimizationAI for Finance
W
Wonbin Ahn
LG AI Research & Ulsan National Institute of Science and Technology (UNIST)