Teaching AI to Handle Exceptions: Supervised Fine-Tuning with Human-Aligned Judgment

📅 2025-03-04

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Large language models (LLMs) exhibit rigid rule-following behavior in contractual exceptional scenarios, diverging from human judgment—a fundamental limitation arising from the tension between contractual incompleteness and LLM policy inflexibility. Method: We propose, for the first time, moving beyond conventional behavioral labels in supervised fine-tuning by incorporating human explanatory feedback to construct training data, and design a comparative evaluation framework featuring ethical prompting and chain-of-thought baselines. Contribution/Results: (1) We empirically demonstrate that behavioral labels alone are insufficient for human–AI alignment; (2) our approach enables generalization of human-like exception handling across diverse contractual scenarios; and (3) it significantly improves out-of-distribution exception resolution, achieving unprecedented decision consistency. Results indicate that explanation-driven fine-tuning constitutes a critical pathway toward reliable human–AI alignment in normative reasoning tasks.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs), initially developed for generative AI, are now evolving into agentic AI systems, which make decisions in complex, real-world contexts. Unfortunately, while their generative capabilities are well-documented, their decision-making processes remain poorly understood. This is particularly evident when models are handling exceptions, a critical and challenging aspect of decision-making made relevant by the inherent incompleteness of contracts. Here we demonstrate that LLMs, even ones that excel at reasoning, deviate significantly from human judgments because they adhere strictly to policies, even when such adherence is impractical, suboptimal, or even counterproductive. We then evaluate three approaches to tuning AI agents to handle exceptions: ethical framework prompting, chain-of-thought reasoning, and supervised fine-tuning. We find that while ethical framework prompting fails and chain-of-thought prompting provides only slight improvements, supervised fine-tuning, specifically with human explanations, yields markedly better results. Surprisingly, in our experiments, supervised fine-tuning even enabled models to generalize human-like decision-making to novel scenarios, demonstrating transfer learning of human-aligned decision-making across contexts. Furthermore, fine-tuning with explanations, not just labels, was critical for alignment, suggesting that aligning LLMs with human judgment requires explicit training on how decisions are made, not just which decisions are made. These findings highlight the need to address LLMs' shortcomings in handling exceptions in order to guide the development of agentic AI toward models that can effectively align with human judgment and simultaneously adapt to novel contexts.

Problem

Research questions and friction points this paper is trying to address.

LLMs deviate from human judgment in handling exceptions.

Supervised fine-tuning improves human-aligned decision-making in AI.

Training with explanations, not just labels, enhances AI alignment.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Supervised fine-tuning with human explanations

Ethical framework prompting and chain-of-thought reasoning

Transfer learning for human-aligned decision-making

🔎 Similar Papers

No similar papers found.

Authors to Follow