Preference Elicitation for Step-Wise Explanations in Logic Puzzles

πŸ“… 2025-11-13
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Logical puzzle explanations suffer from suboptimal step-wise quality due to the difficulty of designing effective multi-objective scoring functions. To address this, we propose MACHOP, a framework that learns optimal explanation steps via interactive preference learning from user pairwise comparisons. To tackle instability arising from multi-scale sub-objectives and query redundancy, we introduce dynamic normalization and design an active querying strategy that jointly incorporates non-dominated constraints and upper-confidence-bound (UCB)-based diversity. Our method models preference learning as a multi-armed bandit problem, enabling interpretable explanation generation for Sudoku and Logic-Grid puzzles. Extensive simulations and real-user studies demonstrate that MACHOP significantly improves explanation quality and comprehensibility over conventional heuristic and static learning approaches.

Technology Category

Application Category

πŸ“ Abstract
Step-wise explanations can explain logic puzzles and other satisfaction problems by showing how to derive decisions step by step. Each step consists of a set of constraints that derive an assignment to one or more decision variables. However, many candidate explanation steps exist, with different sets of constraints and different decisions they derive. To identify the most comprehensible one, a user-defined objective function is required to quantify the quality of each step. However, defining a good objective function is challenging. Here, interactive preference elicitation methods from the wider machine learning community can offer a way to learn user preferences from pairwise comparisons. We investigate the feasibility of this approach for step-wise explanations and address several limitations that distinguish it from elicitation for standard combinatorial problems. First, because the explanation quality is measured using multiple sub-objectives that can vary a lot in scale, we propose two dynamic normalization techniques to rescale these features and stabilize the learning process. We also observed that many generated comparisons involve similar explanations. For this reason, we introduce MACHOP (Multi-Armed CHOice Perceptron), a novel query generation strategy that integrates non-domination constraints with upper confidence bound-based diversification. We evaluate the elicitation techniques on Sudokus and Logic-Grid puzzles using artificial users, and validate them with a real-user evaluation. In both settings, MACHOP consistently produces higher-quality explanations than the standard approach.
Problem

Research questions and friction points this paper is trying to address.

Preference elicitation identifies optimal step-wise logic puzzle explanations
Dynamic normalization stabilizes multi-objective explanation quality learning
MACHOP strategy improves explanation quality through diversified query generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Interactive preference elicitation from pairwise comparisons
Dynamic normalization for multi-scale sub-objectives
MACHOP query strategy with diversification constraints
πŸ”Ž Similar Papers
No similar papers found.
M
Marco Foschini
KU Leuven, Dept. of Computer Science, Celestijnenlaan 200A, 3001 Heverlee, Belgium
M
Marianne Defresne
UniversitΓ© de Toulouse, LAAS-CNRS, Av. du Colonel Roche 7, 31400 Toulouse, France
E
Emilio Gamba
Flanders Make, Gaston Geenslaan 8, 3001 Heverlee, Belgium
Bart Bogaerts
Bart Bogaerts
Research Professor, DTAI lab, Department of Computer Science, KU Leuven
Combinatorial OptimizationProofsKnoweledge RepresentationLogic
T
Tias Guns
KU Leuven, Dept. of Computer Science, Celestijnenlaan 200A, 3001 Heverlee, Belgium