Complex Logical Instruction Generation

📅 2025-08-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing research lacks systematic evaluation of large language models’ (LLMs) ability to process complex logical instructions—such as conditionals, nesting, and recursion. Method: This paper introduces LogicIFGen, the first framework to automatically generate verifiable, logic-rich natural language instructions from program code, enabling construction of LogicIFEval—a high-quality, human-validated benchmark. The approach integrates program parsing, logical structure mapping, and controllable neural natural language generation, with rigorous human verification and automated quality control. Contribution/Results: Experiments reveal that state-of-the-art LLMs achieve ≤60% accuracy on LogicIFEval, exposing critical deficiencies in logical structure comprehension and execution. LogicIFGen establishes a novel paradigm for evaluating instruction-following capabilities, providing a principled methodology, an open benchmark, and empirically grounded insights into LLMs’ logical reasoning limitations.

Technology Category

Application Category

📝 Abstract
Instruction following has catalyzed the recent era of Large Language Models (LLMs) and is the foundational skill underpinning more advanced capabilities such as reasoning and agentic behaviors. As tasks grow more challenging, the logic structures embedded in natural language instructions becomes increasingly intricate. However, how well LLMs perform on such logic-rich instructions remains under-explored. We propose LogicIFGen and LogicIFEval. LogicIFGen is a scalable, automated framework for generating verifiable instructions from code functions, which can naturally express rich logic such as conditionals, nesting, recursion, and function calls. We further curate a collection of complex code functions and use LogicIFGen to construct LogicIFEval, a benchmark comprising 426 verifiable logic-rich instructions. Our experiments demonstrate that current state-of-the-art LLMs still struggle to correctly follow the instructions in LogicIFEval. Most LLMs can only follow fewer than 60% of the instructions, revealing significant deficiencies in the instruction-following ability. Code and Benchmark: https://github.com/mianzhang/LogicIF
Problem

Research questions and friction points this paper is trying to address.

Exploring LLMs' performance on logic-rich instructions
Generating verifiable instructions from complex code functions
Assessing LLMs' deficiencies in following intricate logic instructions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated framework for verifiable instruction generation
Benchmark with 426 logic-rich verifiable instructions
Evaluates LLMs on complex logical instruction following
🔎 Similar Papers
No similar papers found.
Mian Zhang
Mian Zhang
University of Texas at Dallas
LLM
Shujian Liu
Shujian Liu
Zoom Communications
Natural language processingDeep learningWind energyAerodynamicsHigh performance computing
Sixun Dong
Sixun Dong
Arizona State University
Computer VisonMultimodal LearningVisual Language Model
M
Ming Yin
Zoom Video Communications
Y
Yebowen Hu
Zoom Video Communications
X
Xun Wang
Zoom Video Communications
S
Steven Ma
Zoom Video Communications
S
Song Wang
Zoom Video Communications
S
Sathish Reddy Indurthi
Zoom Video Communications
H
Haoyun Deng
Zoom Video Communications
Zhiyu Zoey Chen
Zhiyu Zoey Chen
Assistant Professor, the University of Texas at Dallas
Artificial IntelligenceNatural Language ProcessingAI for Health
K
Kaiqiang Song
Zoom Video Communications