🤖 AI Summary
Current methods for detecting oxidation in frying oil rely on destructive chemical analyses that lack spatial resolution and are unsuitable for real-time applications, while thermal imaging approaches suffer from sensor artifacts and poor generalization. To address these limitations, this work proposes a dual-stream RGB–thermal fusion framework that integrates an adversarial dual-encoder architecture with a FiLM-based fusion mechanism. The model simultaneously performs oil region segmentation, usability classification, and regression of four oxidative indicators in a single forward pass, effectively disentangling sensor-induced artifacts from genuine chemical signals. Built upon the ThermalMiT-B2 backbone, RGB-MAE pretraining, attention mechanisms, and gradient reversal layers, the proposed approach achieves state-of-the-art performance on a dataset of 7,226 frames, yielding a mean Intersection over Union (mIoU) of 98.97%, perfect classification accuracy, and an average regression mean absolute error (MAE) of 2.32—significantly outperforming seven baseline methods.
📝 Abstract
Monitoring frying oil degradation is critical for food safety, yet current practice relies on destructive wet-chemistry assays that provide no spatial information and are unsuitable for real-time use. We identify a fundamental obstacle in thermal-image-based inspection, the camera-fingerprint shortcut, whereby models memorize sensor-specific noise and thermal bias instead of learning oxidation chemistry, collapsing under video-disjoint evaluation. We propose FryNet, a dual-stream RGB-thermal framework that jointly performs oil-region segmentation, serviceability classification, and regression of four chemical oxidation indices (PV, p-AV, Totox, temperature) in a single forward pass. A ThermalMiT-B2 backbone with channel and spatial attention extracts thermal features, while an RGB-MAE Encoder learns chemically grounded representations via masked autoencoding and chemical alignment. Dual-Encoder DANN adversarially regularizes both streams against video identity via Gradient Reversal Layers, and FiLM fusion bridges thermal structure with RGB chemical context. On 7,226 paired frames across 28 frying videos, FryNet achieves 98.97% mIoU, 100% classification accuracy, and 2.32 mean regression MAE, outperforming all seven baselines.