Training a Custom CNN on Five Heterogeneous Image Datasets

📅 2026-01-08

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This work proposes a lightweight, customized CNN architecture to address the significant disparities between agricultural and urban scene images—particularly in illumination, resolution, environmental complexity, and class imbalance—and to build an efficient, robust general-purpose visual classification model. Through systematic comparisons with mainstream architectures such as ResNet-18 and VGG-16 across five heterogeneous datasets, the study evaluates the proposed model’s convergence behavior, generalization capability, and performance under both from-scratch training and transfer learning settings across varying data scales. Experimental results demonstrate that the custom CNN achieves accuracy comparable to established models while maintaining a compact footprint. Furthermore, this study provides the first systematic characterization of the practical performance boundaries of transfer learning in small-sample, highly heterogeneous scenarios, offering both theoretical insights and practical guidance for deployment under resource constraints.

Technology Category

Application Category

📝 Abstract

Deep learning has transformed visual data analysis, with Convolutional Neural Networks (CNNs) becoming highly effective in learning meaningful feature representations directly from images. Unlike traditional manual feature engineering methods, CNNs automatically extract hierarchical visual patterns, enabling strong performance across diverse real-world contexts. This study investigates the effectiveness of CNN-based architectures across five heterogeneous datasets spanning agricultural and urban domains: mango variety classification, paddy variety identification, road surface condition assessment, auto-rickshaw detection, and footpath encroachment monitoring. These datasets introduce varying challenges, including differences in illumination, resolution, environmental complexity, and class imbalance, necessitating adaptable and robust learning models. We evaluate a lightweight, task-specific custom CNN alongside established deep architectures, including ResNet-18 and VGG-16, trained both from scratch and using transfer learning. Through systematic preprocessing, augmentation, and controlled experimentation, we analyze how architectural complexity, model depth, and pre-training influence convergence, generalization, and performance across datasets of differing scale and difficulty. The key contributions of this work are: (1) the development of an efficient custom CNN that achieves competitive performance across multiple application domains, and (2) a comprehensive comparative analysis highlighting when transfer learning and deep architectures provide substantial advantages, particularly in data-constrained environments. These findings offer practical insights for deploying deep learning models in resource-limited yet high-impact real-world visual classification tasks.

Problem

Research questions and friction points this paper is trying to address.

heterogeneous image datasets

visual classification

convolutional neural networks

domain diversity

class imbalance

Innovation

Methods, ideas, or system contributions that make the work stand out.

custom CNN

heterogeneous datasets

transfer learning