🤖 AI Summary
This work addresses the persistent gap between automated and expert-level chip placement by proposing a novel reinforcement learning paradigm that moves beyond conventional wirelength-centric optimization. Instead of modeling complex placement processes, the method reverse-engineers plausible placement trajectories from a single expert-final layout and leverages these as demonstrations or preference signals to train a reward model capable of capturing the implicit objectives underlying expert decisions. Requiring only one expert design for training, the approach demonstrates strong generalization to unseen chip architectures. Experimental results show that the proposed framework substantially narrows the performance gap between automated placement and human expert quality, offering a promising new direction for physical design automation in integrated circuit layout.
📝 Abstract
Chip placement is a critical step in physical design. While reinforcement learning (RL)-based methods have recently emerged, their training primarily focuses on wirelength optimization, and therefore often fail to achieve expert-quality layouts. We identify the reward design as the primary cause for the performance gap with experts, and instead of formalizing intricate processes, we circumvent this by directly learning from expert layouts to derive a reward model. Our approach starts from the final expert layouts to infer step-by-step expert trajectories. Using these trajectories as demonstrations or preferences, we train a model that captures the latent implicit rewards in expert results. Experiments show that our framework can efficiently learn from even a single design and generalize well to unseen cases.