A Matter of Interest: Understanding Interestingness of Math Problems in Humans and Language Models

📅 2025-11-11

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

This study investigates whether large language models (LLMs) align with humans in judging the “interestingness” and “difficulty” of mathematical problems, particularly across groups with divergent mathematical expertise—crowdworkers versus International Mathematical Olympiad (IMO) participants. Method: We conduct the first systematic empirical comparison between human subjective assessments and outputs from multiple LLMs, quantifying distributional alignment and correlating model-generated rationales with human-elicited justifications. Contribution/Results: While LLMs exhibit coarse discrimination between interesting and uninteresting problems, they fail to replicate human judgment distributions; their generated explanations for interestingness show significantly weak correlation with human rationales. Crucially, LLMs also fail to capture the systematic judgment divergence between experts and non-experts. These findings expose fundamental limitations of current LLMs in modeling mathematical cognition, establish critical boundaries for deploying AI as an educational thinking partner, and introduce the first benchmark framework for human–model alignment on mathematical interestingness.

Technology Category

Application Category

📝 Abstract

The evolution of mathematics has been guided in part by interestingness. From researchers choosing which problems to tackle next, to students deciding which ones to engage with, people's choices are often guided by judgments about how interesting or challenging problems are likely to be. As AI systems, such as LLMs, increasingly participate in mathematics with people -- whether for advanced research or education -- it becomes important to understand how well their judgments align with human ones. Our work examines this alignment through two empirical studies of human and LLM assessment of mathematical interestingness and difficulty, spanning a range of mathematical experience. We study two groups: participants from a crowdsourcing platform and International Math Olympiad competitors. We show that while many LLMs appear to broadly agree with human notions of interestingness, they mostly do not capture the distribution observed in human judgments. Moreover, most LLMs only somewhat align with why humans find certain math problems interesting, showing weak correlation with human-selected interestingness rationales. Together, our findings highlight both the promises and limitations of current LLMs in capturing human interestingness judgments for mathematical AI thought partnerships.

Problem

Research questions and friction points this paper is trying to address.

Examining alignment between human and AI judgments of math problem interestingness

Assessing LLMs' ability to capture human interestingness rationales in mathematics

Evaluating AI systems' capacity for human-like mathematical judgment and challenge perception

Innovation

Methods, ideas, or system contributions that make the work stand out.

Comparing human and LLM interestingness judgments of math problems

Analyzing alignment in difficulty and interestingness rationales

Evaluating LLM performance across diverse mathematical expertise levels

🔎 Similar Papers

No similar papers found.