A Hybrid Deterministic Framework for Named Entity Extraction in Broadcast News Video

📅 2026-02-09

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This work addresses the challenge of named entity extraction from news videos, where diverse on-screen text layouts and the infeasibility of manual annotation hinder reliable performance. To this end, the authors introduce the first balanced, graphics-focused annotated dataset for news videos and propose an interpretable, modular, deterministic multimodal pipeline. The framework synergistically combines rule-based methods with deep learning, featuring a high-precision graphic detector (mAP@0.5 of 95.8%) and a hallucination-free named entity recognition module. Operating under strict auditability and zero-hallucination constraints, the system achieves 79.9% precision and 74.4% recall. A user study further reveals that 59% of viewers struggle to identify on-screen names during fast-paced broadcasts, underscoring the practical relevance of the proposed approach.

Technology Category

Application Category

📝 Abstract

The growing volume of video-based news content has heightened the need for transparent and reliable methods to extract on-screen information. Yet the variability of graphical layouts, typographic conventions, and platform-specific design patterns renders manual indexing impractical. This work presents a comprehensive framework for automatically detecting and extracting personal names from broadcast and social-media-native news videos. It introduces a curated and balanced corpus of annotated frames capturing the diversity of contemporary news graphics and proposes an interpretable, modular extraction pipeline designed to operate under deterministic and auditable conditions. The pipeline is evaluated against a contrasting class of generative multimodal methods, revealing a clear trade-off between deterministic auditability and stochastic inference. The underlying detector achieves 95.8% mAP@0.5, demonstrating operationally robust performance for graphical element localisation. While generative systems achieve marginally higher raw accuracy (F1: 84.18% vs 77.08%), they lack the transparent data lineage required for journalistic and analytical contexts. The proposed pipeline delivers balanced precision (79.9%) and recall (74.4%), avoids hallucination, and provides full traceability across each processing stage. Complementary user findings indicate that 59% of respondents report difficulty reading on-screen names in fast-paced broadcasts, underscoring the practical relevance of the task. The results establish a methodologically rigorous and interpretable baseline for hybrid multimodal information extraction in modern news media.

Problem

Research questions and friction points this paper is trying to address.

Named Entity Extraction

Broadcast News Video

On-screen Text

Information Extraction

Multimodal

Innovation

Methods, ideas, or system contributions that make the work stand out.

deterministic framework

named entity extraction

broadcast news video