🤖 AI Summary
Accurately decoding speech from non-invasive magnetoencephalography (MEG) signals—particularly distinguishing between speech and silence intervals—remains a significant challenge. This work proposes a novel two-stage framework that first employs contrastive learning to retrieve semantically aligned audio segments from a large-scale speech corpus (LibriVox) based on MEG inputs, and then generates a binary speech/silence sequence from the retrieved segment. By circumventing conventional end-to-end speech reconstruction and introducing external large-scale audio retrieval into neural decoding for the first time, the method achieves substantial performance gains. It secured first place in the LibriBrain 2025 Speech Detection Challenge’s extended track with an F1-score of 0.962.
📝 Abstract
Decoding speech from non-invasive brain signals is challenging. For the LibriBrain 2025 Speech Detection task, we propose a novel two-step framework that bypasses direct reconstruction. First, a contrastive learning model retrieves the matching speech segment for the given test MEG from a large-scale audio library (LibriVox). Second, a speech detection model generates the binary silence/speech sequence directly from this retrieved audio. With this approach, our team Sherlock Holmes achieved first place in the extended track (F1-score: 0.962), demonstrating that leveraging external audio databases is a highly effective strategy.