Before Smelling the Video: A Two-Stage Pipeline for Interpretable Video-to-Scent Plans

📅 2026-01-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing olfactory media systems struggle to automatically generate interpretable, dynamically synchronized scent cues for videos, relying instead on manual design with limited generalizability. This work proposes a two-stage video-to-olfaction planning framework: first, semantic content is extracted from video using vision-language models such as CLIP; then, large language models map this semantic representation into human-interpretable scent plans that align with on-screen actions. The approach demonstrates, for the first time, that semantically grounded odor planning is comprehensible even prior to physical scent release. User studies confirm that the generated scent plans significantly outperform baseline methods in both perceptual salience and temporal alignment with visual actions, thereby establishing the feasibility of semantics-driven olfactory media.

Technology Category

Application Category

📝 Abstract
Olfactory cues can enhance immersion in interactive media, yet smell remains rare because it is difficult to author and synchronize with dynamic video. Prior olfactory interfaces rely on designer triggers and fixed event-to-odor mappings that do not scale to unconstrained content. This work examines whether semantic planning for smell is intelligible to people before physical scent delivery. We present a video-to-scent planning pipeline that separates visual semantic extraction using a vision-language model from semantic-to-olfactory inference using a large language model. Two survey studies compare system-generated scent plans with over-inclusive and naive baselines. Results show consistent preference for plans that prioritize perceptually salient cues and align scent changes with visible actions, supporting semantic planning as a foundation for future olfactory media systems.
Problem

Research questions and friction points this paper is trying to address.

video-to-scent
olfactory media
semantic planning
immersive interaction
scent synchronization
Innovation

Methods, ideas, or system contributions that make the work stand out.

video-to-scent planning
semantic olfactory inference
vision-language model
large language model
interpretable multimodal interface
🔎 Similar Papers
No similar papers found.
K
Kaicheng Wang
University of Washington
K
Kevin Zhongyang Shao
University of Washington
Ruiqi Chen
Ruiqi Chen
Vrije Universiteit Brussel
FPGAsDomain-specific Accelerator
S
Sep Makhsous
University of Washington
D
Denise Wilson
University of Washington