D&M: Enriching E-commerce Videos with Sound Effects by Key Moment Detection and SFX Matching

📅 2024-08-23

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

Existing methods for keyframe localization and decorative sound effect (VDSFX) alignment in e-commerce videos only support global audio insertion or isolated timestamp detection, lacking joint modeling of both tasks. Method: This paper proposes the first end-to-end framework unifying moment detection and SFX matching, leveraging multi-modal feature alignment across motion, semantics, and temporal context. We introduce SFX-Moment—the first large-scale e-commerce-specific dataset for SFX-aligned moment annotation—and extend state-of-the-art video moment localization models into SFX-adapted baselines. Contribution/Results: Experiments demonstrate significant performance gains over strong baselines on SFX-Moment, enabling fine-grained, interpretable, and automatic SFX injection. Both code and dataset will be publicly released.

Technology Category

Application Category

📝 Abstract

Videos showcasing specific products are increasingly important for E-commerce. Key moments naturally exist as the first appearance of a specific product, presentation of its distinctive features, the presence of a buying link, etc. Adding proper sound effects (SFX) to these key moments, or video decoration with SFX (VDSFX), is crucial for enhancing the user engaging experience. Previous studies about adding SFX to videos perform video to SFX matching at a holistic level, lacking the ability of adding SFX to a specific moment. Meanwhile, previous studies on video highlight detection or video moment retrieval consider only moment localization, leaving moment to SFX matching untouched. By contrast, we propose in this paper D&M, a unified method that accomplishes key moment detection and moment to SFX matching simultaneously. Moreover, for the new VDSFX task we build a large-scale dataset SFX-Moment from an E-commerce platform. For a fair comparison, we build competitive baselines by extending a number of current video moment detection methods to the new task. Extensive experiments on SFX-Moment show the superior performance of the proposed method over the baselines. Code and data will be released.

Problem

Research questions and friction points this paper is trying to address.

Detects key moments in e-commerce videos

Matches sound effects to specific moments

Enhances user engagement with video decoration

Innovation

Methods, ideas, or system contributions that make the work stand out.

Key moment detection

Moment to SFX matching

Large-scale SFX-Moment dataset

🔎 Similar Papers

Unsupervised Video Highlight Detection by Learning from Audio and Visual Recurrence