Taxation Perspectives from Large Language Models: A Case Study on Additional Tax Penalties

📅 2025-03-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Prior work lacks an open-source benchmark for assessing the legality of附加 penalties in tax law, relying instead on simplified or non-reproducible datasets. Method: We introduce PLAT—the first open-source Tax Law Reasoning Benchmark—designed for fine-grained penalty legality prediction in realistic, complex scenarios. Our approach proposes a novel framework integrating retrieval-augmented generation (RAG), self-reflection via chain-of-thought reasoning, and multi-role agent collaboration, enhanced by a role-driven self-reflection mechanism to resolve statutory conflicts and interpretive ambiguities in tax legislation. Contribution/Results: We evaluate six state-of-the-art large language models on PLAT; baseline performance is limited. In contrast, our framework achieves significant accuracy improvements, demonstrating its effectiveness for tax legal reasoning. This work fills two critical gaps: (1) a high-quality, domain-specific evaluation benchmark for tax law, and (2) an interpretable, structured reasoning paradigm grounded in legal practice.

Technology Category

Application Category

📝 Abstract
How capable are large language models (LLMs) in the domain of taxation? Although numerous studies have explored the legal domain in general, research dedicated to taxation remain scarce. Moreover, the datasets used in these studies are either simplified, failing to reflect the real-world complexities, or unavailable as open source. To address this gap, we introduce PLAT, a new benchmark designed to assess the ability of LLMs to predict the legitimacy of additional tax penalties. PLAT is constructed to evaluate LLMs' understanding of tax law, particularly in cases where resolving the issue requires more than just applying related statutes. Our experiments with six LLMs reveal that their baseline capabilities are limited, especially when dealing with conflicting issues that demand a comprehensive understanding. However, we found that enabling retrieval, self-reasoning, and discussion among multiple agents with specific role assignments, this limitation can be mitigated.
Problem

Research questions and friction points this paper is trying to address.

Assessing LLMs' ability to predict tax penalty legitimacy.
Evaluating LLMs' understanding of complex tax law scenarios.
Mitigating LLMs' limitations through retrieval and multi-agent reasoning.
Innovation

Methods, ideas, or system contributions that make the work stand out.

PLAT benchmark assesses LLMs' tax penalty prediction
Retrieval, self-reasoning, and multi-agent discussion enhance LLMs
Role-specific agents improve LLMs' tax law understanding