Ebisu: Benchmarking Large Language Models in Japanese Finance

📅 2026-02-01

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This work addresses the challenges large language models face in accurately interpreting implicit commitments and nested financial terminology in Japanese financial texts, due to the language’s agglutinative morphology, mixed writing systems, and high-context expressions. To this end, we introduce Ebisu, the first Japanese financial benchmark, comprising two expert-annotated tasks: Implicit Commitment Recognition in investor Q&A (JF-ICR) and hierarchical extraction and ranking of financial terms in professional disclosures (JF-TE). Using high-quality human-annotated data, we systematically evaluate a range of open- and closed-source models—including general-purpose, Japanese-adapted, and finance-specialized variants. Results reveal consistently suboptimal performance across both tasks, with neither model scaling nor domain-specific fine-tuning yielding reliable improvements, thereby highlighting significant limitations in current approaches to understanding high-context Japanese financial discourse.

Technology Category

Application Category

📝 Abstract

Japanese finance combines agglutinative, head-final linguistic structure, mixed writing systems, and high-context communication norms that rely on indirect expression and implicit commitment, posing a substantial challenge for LLMs. We introduce Ebisu, a benchmark for native Japanese financial language understanding, comprising two linguistically and culturally grounded, expert-annotated tasks: JF-ICR, which evaluates implicit commitment and refusal recognition in investor-facing Q&A, and JF-TE, which assesses hierarchical extraction and ranking of nested financial terminology from professional disclosures. We evaluate a diverse set of open-source and proprietary LLMs spanning general-purpose, Japanese-adapted, and financial models. Results show that even state-of-the-art systems struggle on both tasks. While increased model scale yields limited improvements, language- and domain-specific adaptation does not reliably improve performance, leaving substantial gaps unresolved. Ebisu provides a focused benchmark for advancing linguistically and culturally grounded financial NLP. All datasets and evaluation scripts are publicly released.

Problem

Research questions and friction points this paper is trying to address.

Japanese finance

large language models

implicit commitment

financial NLP

linguistic structure

Innovation

Methods, ideas, or system contributions that make the work stand out.

Japanese financial NLP

implicit commitment recognition

hierarchical terminology extraction