Economy Watchers Survey provides Datasets and Tasks for Japanese Financial Domain

📅 2024-07-20

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

🤖 AI Summary

To address the longstanding absence of standardized evaluation resources for Japanese financial-domain NLP, this work introduces the first benchmark dataset featuring multi-task support, standardized curation, and automated updates. It comprises three tasks: sentence classification (3-way and 12-way) and fine-grained sentiment analysis (5-way), all constructed from publicly available Japanese government financial documents. Data preprocessing and annotation employ a rule-enhanced pipeline, while a version-controlled, automated update framework ensures temporal relevance and experimental reproducibility. This dataset fills a critical gap in non-English financial NLP—namely, the lack of high-quality, dynamically maintained evaluation benchmarks—and substantially enhances the reliability of model evaluation and domain adaptation research. All resources—including data, code, and documentation—are open-sourced and actively maintained.

Technology Category

Application Category

📝 Abstract

Natural language processing (NLP) tasks in English and general domains are widely available and are often used to evaluate pre-trained language models. In contrast, fewer tasks are available for languages other than English and in the financial domain. Particularly, tasks in the Japanese and financial domains are limited. We develop two large datasets using data published by a Japanese central government agency. The datasets provide three Japanese financial NLP tasks, including 3- and 12-class classifications for categorizing sentences, along with a 5-class classification task for sentiment analysis. Our datasets are designed to be comprehensive and updated by leveraging an automatic update framework that ensures that the latest task datasets are publicly always available.

Problem

Research questions and friction points this paper is trying to address.

Japanese NLP

Financial Domain

Resource Scarcity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Japanese Financial NLP Databases

Automated Updating Functionality

Multi-Class Sentiment Analysis

🔎 Similar Papers

No similar papers found.

Authors to Follow