๐ค AI Summary
Existing public datasets inadequately cover railway obstacle detection under diverse weather conditions, geographical settings, and track types, resulting in scarce real-world annotations and incomplete class coverage. To address this, we propose SynRailObsโthe first high-fidelity synthetic dataset for railway safety, generated via physics-based simulation to model multi-scenario rail environments. Crucially, it introduces a novel diffusion-model-based synthesis pipeline for rare obstacles and enables zero-shot transfer. Our methodology integrates multi-condition domain-adaptive training and rigorously validates robustness across ballasted and ballastless tracks. Experiments demonstrate that models trained on SynRailObs achieve superior cross-distance and cross-weather stability in real-world deployment, with zero-shot detection accuracy significantly outperforming baseline methods. The SynRailObs dataset is publicly released on Kaggle to foster reproducible research in railway perception.
๐ Abstract
Detecting potential obstacles in railway environments is critical for preventing serious accidents. Identifying a broad range of obstacle categories under complex conditions requires large-scale datasets with precisely annotated, high-quality images. However, existing publicly available datasets fail to meet these requirements, thereby hindering progress in railway safety research. To address this gap, we introduce SynRailObs, a high-fidelity synthetic dataset designed to represent a diverse range of weather conditions and geographical features. Furthermore, diffusion models are employed to generate rare and difficult-to-capture obstacles that are typically challenging to obtain in real-world scenarios. To evaluate the effectiveness of SynRailObs, we perform experiments in real-world railway environments, testing on both ballasted and ballastless tracks across various weather conditions. The results demonstrate that SynRailObs holds substantial potential for advancing obstacle detection in railway safety applications. Models trained on this dataset show consistent performance across different distances and environmental conditions. Moreover, the model trained on SynRailObs exhibits zero-shot capabilities, which are essential for applications in security-sensitive domains. The data is available in https://www.kaggle.com/datasets/qiushi910/synrailobs.