🤖 AI Summary
In online learning, input feature dimensions vary dynamically (“haphazard inputs”), rendering mainstream vision models—such as ResNet and ViT—unsuitable for direct application due to their reliance on fixed-dimensional inputs. To address this, we propose the first model-agnostic online image representation framework that maps variable-length sequences into fixed-size 2D images in real time. Our method employs dynamic sequence padding and optimized spatial layout to achieve structured encoding, followed by integration with standard vision backbones for end-to-end learning. This eliminates the need for architecture-specific designs, enabling plug-and-play deployment of state-of-the-art vision models in online learning settings. Extensive evaluation across four public benchmarks demonstrates significant improvements in model robustness, generalization, and scalability. The implementation is publicly available.
📝 Abstract
The field of varying feature space in online learning settings, also known as haphazard inputs, is very prominent nowadays due to its applicability in various fields. However, the current solutions to haphazard inputs are model-dependent and cannot benefit from the existing advanced deep-learning methods, which necessitate inputs of fixed dimensions. Therefore, we propose to transform the varying feature space in an online learning setting to a fixed-dimension image representation on the fly. This simple yet novel approach is model-agnostic, allowing any vision-based models to be applicable for haphazard inputs, as demonstrated using ResNet and ViT. The image representation handles the inconsistent input data seamlessly, making our proposed approach scalable and robust. We show the efficacy of our method on four publicly available datasets. The code is available at https://github.com/Rohit102497/HaphazardInputsAsImages.