🤖 AI Summary
This study investigates the capability of large language models (LLMs) to autonomously design complete, executable heuristic solvers for combinatorial optimization—specifically, the constrained 3D bin packing problem—without prior algorithmic scaffolding.
Method: We introduce a constraint scaffolding framework coupled with an iterative self-correction mechanism, enabling LLMs to generate verifiable, runnable search algorithm code.
Contribution/Results: Experiments reveal that LLMs inherently optimize scoring functions, exposing pretraining biases that limit search diversity. The purely greedy heuristics generated by LLMs match the performance of human-designed counterparts; when integrated with human-crafted metaheuristic structures, solution quality reaches that of mainstream solvers. However, generalization degrades under stringent constraints. This work provides the first systematic validation of LLMs’ ability to construct practically viable, end-to-end combinatorial optimization heuristics from scratch—and delineates their fundamental capabilities and limitations.
📝 Abstract
The art of heuristic design has traditionally been a human pursuit. While Large Language Models (LLMs) can generate code for search heuristics, their application has largely been confined to adjusting simple functions within human-crafted frameworks, leaving their capacity for broader innovation an open question. To investigate this, we tasked an LLM with building a complete solver for the constrained 3D Packing Problem. Direct code generation quickly proved fragile, prompting us to introduce two supports: constraint scaffolding--prewritten constraint-checking code--and iterative self-correction--additional refinement cycles to repair bugs and produce a viable initial population. Notably, even within a vast search space in a greedy process, the LLM concentrated its efforts almost exclusively on refining the scoring function. This suggests that the emphasis on scoring functions in prior work may reflect not a principled strategy, but rather a natural limitation of LLM capabilities. The resulting heuristic was comparable to a human-designed greedy algorithm, and when its scoring function was integrated into a human-crafted metaheuristic, its performance rivaled established solvers, though its effectiveness waned as constraints tightened. Our findings highlight two major barriers to automated heuristic design with current LLMs: the engineering required to mitigate their fragility in complex reasoning tasks, and the influence of pretrained biases, which can prematurely narrow the search for novel solutions.