🤖 AI Summary
This work addresses the coarse granularity, low interpretability, and poor robustness of AI-related disclosure identification in financial reports. We propose a domain-specific, sentence-level classification framework for finance. Methodologically, we employ a domain-adapted BERT model fine-tuned for binary sentence classification, augmented with SHAP-based attribution analysis to enhance decision transparency, and benchmark against baselines including Logistic Regression and Naive Bayes. Evaluated on a manually annotated dataset of 1,586 sentences, our model achieves 99.37% accuracy and an F1-score of 0.993—substantially outperforming traditional approaches—while demonstrating strong stability across temporal shifts and varying sentence lengths. Key contributions include: (i) the first fine-grained, human-annotated dataset of AI-related disclosure sentences in financial texts; (ii) an end-to-end detection paradigm integrating domain adaptation and interpretability-by-design; and (iii) a high-precision, auditable technical foundation for monitoring AI adoption in financial institutions.
📝 Abstract
The proliferation of artificial intelligence (AI) in financial services has prompted growing demand for tools that can systematically detect AI-related disclosures in corporate filings. While prior approaches often rely on keyword expansion or document-level classification, they fall short in granularity, interpretability, and robustness. This study introduces FinAI-BERT, a domain-adapted transformer-based language model designed to classify AI-related content at the sentence level within financial texts. The model was fine-tuned on a manually curated and balanced dataset of 1,586 sentences drawn from 669 annual reports of U.S. banks (2015 to 2023). FinAI-BERT achieved near-perfect classification performance (accuracy of 99.37 percent, F1 score of 0.993), outperforming traditional baselines such as Logistic Regression, Naive Bayes, Random Forest, and XGBoost. Interpretability was ensured through SHAP-based token attribution, while bias analysis and robustness checks confirmed the model's stability across sentence lengths, adversarial inputs, and temporal samples. Theoretically, the study advances financial NLP by operationalizing fine-grained, theme-specific classification using transformer architectures. Practically, it offers a scalable, transparent solution for analysts, regulators, and scholars seeking to monitor the diffusion and framing of AI across financial institutions.