Absurd World: A Simple Yet Powerful Method to Absurdify the Real-world for Probing LLM Reasoning Capabilities

πŸ“… 2026-05-10
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

197K/year
πŸ€– AI Summary
Current large language models lack systematic evaluation of their reasoning capabilities when faced with tasks that are logically consistent yet violate real-world commonsense. This work proposes the Absurd World benchmark framework, which models the real world in terms of symbols, actions, sequences, and events, and automatically constructs scenarios that are logically coherent but absurd in real-world terms. By distorting the factual background while preserving task logic, the framework probes models’ reliance on real-world biases. Integrating structured world modeling, automated scenario generation, and diverse prompting strategies, the approach effectively distinguishes whether models exhibit genuine logical reasoning or merely reproduce patterns from their training data reflecting real-world regularities.
πŸ“ Abstract
While extremely powerful and versatile at various tasks, the thinking capabilities of large language models (LLMs) are often put under scrutiny as they sometimes fail to solve problems that humans can systematically solve. However, recent literature focuses on breaking LLM reasoning with increasingly complex problems, and whether an LLM is robust in simple logical reasoning remains underexplored. This paper proposes Absurd World, a benchmarking framework, to test LLMs against altered realism, where scenarios are logically coherent, and humans can easily solve the tasks. Absurd World breaks a real-world model into symbols, actions, sequences, and events, which are automatically altered to create absurd worlds where the logic to solve the tasks remains the same. It evaluates a large collection of models with simple and advanced prompting techniques, and proves that it is an effective tool to determine LLMs' ability to think logically, ignoring the patterns learned from the real world. One can use this framework to extensively test an LLM against a real-world problem to verify whether the LLM's reasoning capability is robust against variations of the task.
Problem

Research questions and friction points this paper is trying to address.

LLM reasoning
logical reasoning
robustness
absurd scenarios
benchmarking
Innovation

Methods, ideas, or system contributions that make the work stand out.

Absurd World
logical reasoning
benchmarking framework
large language models
altered realism