General Hazard Detection

📅 2026-05-22

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

Existing hazard detection systems are constrained by predefined categories and reliance on large-scale annotated data, struggling to handle abstract safety concepts under conditions of data sparsity, evolving definitions, and cross-scenario generalization. This work proposes a language-rule-based framework for general-purpose hazard detection, where safety requirements are expressed as natural language rules decoupled from image examples. By integrating a vision-language model (LLaVA), rule-driven compliance evaluation, active learning, and human-in-the-loop mechanisms, the approach enables context-sensitive, fine-grained judgments. To support this paradigm, we introduce CompliVision—a dataset of 3,006 multi-domain images, each annotated with rule-compliance labels and natural language explanations—substantially enhancing model generalization to unseen scenarios.

📝 Abstract

Hazard, as an abstract concept, is typically defined through cognitive-level logical reasoning rather than concrete examples. In contrast, existing hazard detection systems rely on predefined hazard categories and require intensive collection of labelled examples within detection or classification architectures. This approach faces three fundamental challenges when addressing abstract safety concepts: (1) noisy and sparse training data, (2) dynamically evolving definitions that change across contexts and time, and (3) limited generalisation to unseen or novel scenarios. To address these limitations, we present the CompliVision dataset, the first general-purpose hazard dataset designed for rule-based compliance assessment, along with a baseline framework for hazard evaluation. Our key innovation is decoupling the hazard concept from image-based examples by expressing safety requirements through language-based rules. We ground our approach in authoritative domain regulations and ISO standards to define diverse hazard concepts across multiple domains. The CompliVision dataset comprises 3,006 images spanning traffic, construction, and warehouse environments, with each image annotated for compliance against specific safety rules, accompanied by natural language explanations highlighting the supporting visual evidence. To achieve robust generalisation, we develop an active learning framework to more effectively guide and refine vision-language models in assessing hazard compliance. While state-of-the-art VLMs demonstrate strong capabilities, they struggle with the fine-grained, context-dependent interpretation required for accurate safety assessment. We proposed a general hazard detection framework to address this limitation which combines LLaVA-based visual reasoning with with human-in-the-loop feedback.

Problem

Research questions and friction points this paper is trying to address.

hazard detection

abstract safety concepts

generalisation

dynamic definitions

sparse training data

Innovation

Methods, ideas, or system contributions that make the work stand out.

rule-based hazard detection

vision-language models

compliance assessment