CADO: From Imitation to Cost Minimization for Heatmap-based Solvers in Combinatorial Optimization

📅 2026-02-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the objective mismatch in existing heatmap-based combinatorial optimization solvers, where minimizing imitation loss during supervised training does not guarantee cost-optimal solutions. To bridge this gap, the authors formulate the diffusion denoising process as a Markov decision process and introduce a Label-Centered Reward mechanism coupled with a Hybrid Fine-Tuning strategy. By leveraging reinforcement learning to directly optimize the cost of decoded solutions, the approach effectively mitigates both Decoder-Blindness and Cost-Blindness, achieving end-to-end cost-aware objective alignment. Experimental results across multiple combinatorial optimization benchmarks demonstrate that the proposed method attains state-of-the-art performance, underscoring the critical role of objective alignment in enhancing the efficacy of heatmap-based solvers.

Technology Category

Application Category

📝 Abstract
Heatmap-based solvers have emerged as a promising paradigm for Combinatorial Optimization (CO). However, we argue that the dominant Supervised Learning (SL) training paradigm suffers from a fundamental objective mismatch: minimizing imitation loss (e.g., cross-entropy) does not guarantee solution cost minimization. We dissect this mismatch into two deficiencies: Decoder-Blindness (being oblivious to the non-differentiable decoding process) and Cost-Blindness (prioritizing structural imitation over solution quality). We empirically demonstrate that these intrinsic flaws impose a hard performance ceiling. To overcome this limitation, we propose CADO (Cost-Aware Diffusion models for Optimization), a streamlined Reinforcement Learning fine-tuning framework that formulates the diffusion denoising process as an MDP to directly optimize the post-decoded solution cost. We introduce Label-Centered Reward, which repurposes ground-truth labels as unbiased baselines rather than imitation targets, and Hybrid Fine-Tuning for parameter-efficient adaptation. CADO achieves state-of-the-art performance across diverse benchmarks, validating that objective alignment is essential for unlocking the full potential of heatmap-based solvers.
Problem

Research questions and friction points this paper is trying to address.

Combinatorial Optimization
Heatmap-based Solvers
Objective Mismatch
Cost Minimization
Supervised Learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cost-Aware Diffusion
Objective Alignment
Reinforcement Learning Fine-tuning
Heatmap-based Solvers
Label-Centered Reward
🔎 Similar Papers
No similar papers found.