SafeMath: Inference-time Safety improves Math Accuracy

πŸ“… 2026-03-26
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work reveals that mathematical word problems can serve as covert carriers of harmful content, posing psychological and ethical risks to childrenβ€”a vulnerability overlooked by current large language models. To address this, the authors introduce ToxicGSM, a benchmark dataset comprising 1,900 annotated samples, and propose SafeMath, a novel approach that decouples safety intervention from mathematical reasoning during inference. By integrating safety constraints without compromising task-specific logic, SafeMath enables joint optimization of harm mitigation and problem-solving accuracy. Experimental results demonstrate that SafeMath significantly reduces the generation of harmful outputs while preserving or even enhancing mathematical reasoning performance, establishing a new paradigm wherein safety alignment and task efficacy are not mutually exclusive but synergistically achievable.

Technology Category

Application Category

πŸ“ Abstract
Recent research points toward LLMs being manipulated through adversarial and seemingly benign inputs, resulting in harmful, biased, or policy-violating outputs. In this paper, we study an underexplored issue concerning harmful and toxic mathematical word problems. We show that math questions, particularly those framed as natural language narratives, can serve as a subtle medium for propagating biased, unethical, or psychologically harmful content, with heightened risks in educational settings involving children. To support a systematic study of this phenomenon, we introduce ToxicGSM, a dataset of 1.9k arithmetic problems in which harmful or sensitive context is embedded while preserving mathematically well-defined reasoning tasks. Using this dataset, we audit the behaviour of existing LLMs and analyse the trade-offs between safety enforcement and mathematical correctness. We further propose SafeMath -- a safety alignment technique that reduces harmful outputs while maintaining, and in some cases improving, mathematical reasoning performance. Our results highlight the importance of disentangling linguistic harm from math reasoning and demonstrate that effective safety alignment need not come at the cost of accuracy. We release the source code and dataset at https://github.com/Swagnick99/SafeMath/tree/main.
Problem

Research questions and friction points this paper is trying to address.

harmful mathematical word problems
toxic content
educational safety
biased narratives
mathematical reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

SafeMath
inference-time safety
mathematical reasoning
harmful content mitigation
ToxicGSM
πŸ”Ž Similar Papers
No similar papers found.