Re:Member: Emotional Question Generation from Personal Memories

📅 2025-10-21

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This study addresses the dual challenges of insufficient affective engagement and lack of personalization in second language (L2) learning. We propose an immersive, learner-centered methodology that leverages personal memory and affective expression: learners’ self-recorded videos serve as input; WhisperX enables precise speech-text alignment; keyframes are sampled to extract visual scene features; and Style-BERT-VITS2 synthesizes stylistically appropriate, emotionally congruent spoken questions aligned with visual affect. To our knowledge, this is the first work integrating personal memory media with fine-grained emotional speech synthesis to establish a “vision–affect–language” co-adaptive interaction paradigm. Experimental results demonstrate significant improvements in learners’ emotional resonance and willingness to engage in oral interaction, empirically validating the efficacy and innovation of affective, memory-driven design in L2 acquisition.

Technology Category

Application Category

📝 Abstract

We present Re:Member, a system that explores how emotionally expressive, memory-grounded interaction can support more engaging second language (L2) learning. By drawing on users' personal videos and generating stylized spoken questions in the target language, Re:Member is designed to encourage affective recall and conversational engagement. The system aligns emotional tone with visual context, using expressive speech styles such as whispers or late-night tones to evoke specific moods. It combines WhisperX-based transcript alignment, 3-frame visual sampling, and Style-BERT-VITS2 for emotional synthesis within a modular generation pipeline. Designed as a stylized interaction probe, Re:Member highlights the role of affect and personal media in learner-centered educational technologies.

Problem

Research questions and friction points this paper is trying to address.

Generates emotional questions from personal memories for language learning

Uses personal videos to create stylized spoken questions in target language

Explores emotional interaction design for engaging second language acquisition

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates stylized spoken questions from personal videos

Aligns emotional tone with visual context using expressive speech

Combines transcript alignment, visual sampling, and emotional synthesis

🔎 Similar Papers

OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question Answering