Retrieval-Augmented Reasoning for Chartered Accountancy

📅 2026-04-30

📈 Citations: 0

✨ Influential: 0

career value

153K/year

🤖 AI Summary

This work addresses the limitations of large language models in regulation-intensive financial tasks—such as the Indian Chartered Accountant examinations—including weak numerical reasoning, high computational demands, and insufficient jurisdiction-specific knowledge. To overcome these challenges, the authors propose CA-ThinkFlow, a lightweight retrieval-augmented framework that integrates a 4-bit quantized 14B-parameter DeepSeek-R1 model, the layout-aware document parser Docling, a basic RAG mechanism, and built-in chain-of-thought reasoning. Evaluated on the multi-tier CA-Ben benchmark, this approach achieves 68.75% of the Academic Reliability Coefficient (SRC) of GPT-4o and Claude 3.5 Sonnet, demonstrating for the first time that efficient, open-source solutions can be both viable and competitive in the domain of chartered accountancy.

📝 Abstract

The inception of Large Language Models (LLMs) has catalyzed AI adoption in the finance sector, yet their reliability in complex, jurisdiction-specific tasks like Indian Chartered Accountancy (CA) remains limited. The models display difficulty in executing numerical tasks which require multiple steps while also needing advanced knowledge about legal regulations and the method of scaling their operations is not feasible in settings which have limited access to resources. We present CA-ThinkFlow as a parameter-efficient Retrieval-Augmented Generation (RAG) framework which operates with a 14B, 4-bit-quantized reasoning model, 14B-DeepSeek-R1, and a layout-aware Docling extraction system which maintains document structure during extraction. CA-ThinkFlow uses a basic RAG method which automatically adds retrieved information into the prompt, while it depends on the model's built-in Chain-of-Thought (CoT) functions to create context and produce correct answers. The system we developed system operates at performance levels which match large proprietary models when we tested it on the multi-level CA-Ben benchmark, achieving Scholastic Reliability Coefficient (SRC) results which equal 68.75\% of GPT-4o and Claude 3.5 Sonnet. The framework shows high efficiency and strength in handling parameters, but essential reasoning abilities fail to process complex regulatory texts which exist in fields such as Taxation.

Problem

Research questions and friction points this paper is trying to address.

Retrieval-Augmented Reasoning

Chartered Accountancy

Large Language Models

Regulatory Text Understanding

Resource-Constrained Settings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-Augmented Generation

Chain-of-Thought Reasoning

Parameter-Efficient LLM