🤖 AI Summary
This paper addresses the dual challenges of interpretability and real-time processing in automated risk assessment of publicly filed 10-K reports. We propose TinyXRA, a lightweight, hierarchically interpretable Transformer model. Methodologically, it integrates TinyBERT for long-document encoding with a hierarchical attention architecture, and innovatively incorporates skewness, kurtosis, and the Sortino ratio to model multidimensional risk characteristics. A dynamic attention word cloud mechanism is designed to enhance decision transparency, while triplet loss optimizes quartile-based risk classification. Empirically evaluated on the full 2013–2024 10-K corpus, TinyXRA achieves state-of-the-art predictive accuracy over a seven-year horizon, balancing high interpretability with low computational overhead. It supports real-time inference at >1,000 documents per second in production environments. The framework establishes a new paradigm for trustworthy, efficient financial risk analysis—directly applicable to regulatory oversight and investment research.
📝 Abstract
Every publicly traded U.S. company files an annual 10-K report containing critical insights into financial health and risk. We propose Tiny eXplainable Risk Assessor (TinyXRA), a lightweight and explainable transformer-based model that automatically assesses company risk from these reports. Unlike prior work that relies solely on the standard deviation of excess returns (adjusted for the Fama-French model), which indiscriminately penalizes both upside and downside risk, TinyXRA incorporates skewness, kurtosis, and the Sortino ratio for more comprehensive risk assessment. We leverage TinyBERT as our encoder to efficiently process lengthy financial documents, coupled with a novel dynamic, attention-based word cloud mechanism that provides intuitive risk visualization while filtering irrelevant terms. This lightweight design ensures scalable deployment across diverse computing environments with real-time processing capabilities for thousands of financial documents which is essential for production systems with constrained computational resources. We employ triplet loss for risk quartile classification, improving over pairwise loss approaches in existing literature by capturing both the direction and magnitude of risk differences. Our TinyXRA achieves state-of-the-art predictive accuracy across seven test years on a dataset spanning 2013-2024, while providing transparent and interpretable risk assessments. We conduct comprehensive ablation studies to evaluate our contributions and assess model explanations both quantitatively by systematically removing highly attended words and sentences, and qualitatively by examining explanation coherence. The paper concludes with findings, practical implications, limitations, and future research directions.