PiercingEye: Dual-Space Video Violence Detection with Hyperbolic Vision-Language Guidance

📅 2025-04-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing weakly supervised video violence detection methods rely on Euclidean representations, which struggle to distinguish fine-grained violent events that are visually similar but semantically distinct. To address this, we propose the first Euclidean–hyperbolic dual-space learning framework. Our method introduces a layer-sensitive hyperbolic aggregation strategy and a cross-space attention mechanism to model hierarchical semantics; leverages large-language-model-generated logic-guided fuzzy event descriptions to construct a dynamically similarity-weighted hyperbolic vision–language contrastive loss; and imposes a hyperspherical Dirichlet energy constraint to enhance the discriminability of hyperbolic embeddings. Evaluated on XD-Violence and UCF-Crime, our approach achieves state-of-the-art performance, particularly demonstrating significant improvements in fine-grained recognition accuracy on a newly constructed subset of ambiguous violent events.

Technology Category

Application Category

📝 Abstract
Existing weakly supervised video violence detection (VVD) methods primarily rely on Euclidean representation learning, which often struggles to distinguish visually similar yet semantically distinct events due to limited hierarchical modeling and insufficient ambiguous training samples. To address this challenge, we propose PiercingEye, a novel dual-space learning framework that synergizes Euclidean and hyperbolic geometries to enhance discriminative feature representation. Specifically, PiercingEye introduces a layer-sensitive hyperbolic aggregation strategy with hyperbolic Dirichlet energy constraints to progressively model event hierarchies, and a cross-space attention mechanism to facilitate complementary feature interactions between Euclidean and hyperbolic spaces. Furthermore, to mitigate the scarcity of ambiguous samples, we leverage large language models to generate logic-guided ambiguous event descriptions, enabling explicit supervision through a hyperbolic vision-language contrastive loss that prioritizes high-confusion samples via dynamic similarity-aware weighting. Extensive experiments on XD-Violence and UCF-Crime benchmarks demonstrate that PiercingEye achieves state-of-the-art performance, with particularly strong results on a newly curated ambiguous event subset, validating its superior capability in fine-grained violence detection.
Problem

Research questions and friction points this paper is trying to address.

Detect video violence with dual-space learning framework
Enhance feature representation using Euclidean and hyperbolic geometries
Generate ambiguous event descriptions via large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-space learning with Euclidean and hyperbolic geometries
Layer-sensitive hyperbolic aggregation with Dirichlet constraints
Hyperbolic vision-language contrastive loss for ambiguous samples
🔎 Similar Papers
No similar papers found.
Jiaxu Leng
Jiaxu Leng
Chongqing University of Posts and Telecommunications
Computer Vision
Z
Zhanjie Wu
Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing, China, and also with Guangyang Bay Laboratory, Chongqing Institute for Brain and Intelligence, Chongqing 400065, China.
M
Mingpi Tan
Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing, China, and also with Guangyang Bay Laboratory, Chongqing Institute for Brain and Intelligence, Chongqing 400065, China.
M
Mengjingcheng Mo
Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing, China, and also with Guangyang Bay Laboratory, Chongqing Institute for Brain and Intelligence, Chongqing 400065, China.
J
Jiankang Zheng
Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing, China, and also with Guangyang Bay Laboratory, Chongqing Institute for Brain and Intelligence, Chongqing 400065, China.
Qingqing Li
Qingqing Li
Researcher, University of Turku
Sensor FusionRoboticsOdometrySLAMLidars
Ji Gan
Ji Gan
Chongqing University of Posts and Telecommunications
Handwriting recognition and generation
X
Xinbo Gao
Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing, China, and also with Guangyang Bay Laboratory, Chongqing Institute for Brain and Intelligence, Chongqing 400065, China.