🤖 AI Summary
High-precision industrial applications—such as semiconductor manufacturing—lack dedicated benchmark datasets for spatiotemporal video prediction, hindering progress in fine-grained dynamic modeling.
Method: We introduce CHDL, the first publicly available time-series image dataset capturing chip dicing processes, and propose DIFFUMA, a dual-path predictive architecture that synergistically integrates Mamba’s long-range temporal modeling with a temporally guided diffusion mechanism: the former captures global dynamics, while the latter refines spatial details to mitigate feature degradation in fine-grained prediction.
Contribution/Results: On CHDL, DIFFUMA achieves a 39% reduction in MSE and an SSIM of 0.988, substantially outperforming existing methods. Moreover, it demonstrates strong generalization to natural phenomena datasets, attaining state-of-the-art performance across multiple metrics.
📝 Abstract
Spatio-temporal video prediction plays a pivotal role in critical domains, ranging from weather forecasting to industrial automation. However, in high-precision industrial scenarios such as semiconductor manufacturing, the absence of specialized benchmark datasets severely hampers research on modeling and predicting complex processes. To address this challenge, we make a twofold contribution.First, we construct and release the Chip Dicing Lane Dataset (CHDL), the first public temporal image dataset dedicated to the semiconductor wafer dicing process. Captured via an industrial-grade vision system, CHDL provides a much-needed and challenging benchmark for high-fidelity process modeling, defect detection, and digital twin development.Second, we propose DIFFUMA, an innovative dual-path prediction architecture specifically designed for such fine-grained dynamics. The model captures global long-range temporal context through a parallel Mamba module, while simultaneously leveraging a diffusion module, guided by temporal features, to restore and enhance fine-grained spatial details, effectively combating feature degradation. Experiments demonstrate that on our CHDL benchmark, DIFFUMA significantly outperforms existing methods, reducing the Mean Squared Error (MSE) by 39% and improving the Structural Similarity (SSIM) from 0.926 to a near-perfect 0.988. This superior performance also generalizes to natural phenomena datasets. Our work not only delivers a new state-of-the-art (SOTA) model but, more importantly, provides the community with an invaluable data resource to drive future research in industrial AI.