Can LLMs Help Localize Fake Words in Partially Fake Speech?

📅 2026-03-11

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This study addresses the challenge of precisely localizing forged words in speech where only a subset of utterances has been manipulated. To this end, the authors propose a novel speech forgery detection method leveraging large language models (LLMs) pretrained exclusively on text, which—through next-token prediction—achieves word-level localization of falsified content without any audio-specific training. By integrating speech–text alignment with deepfake detection, the approach demonstrates strong performance in identifying known editing styles, such as word-level polarity substitution, on the AV-Deepfake1M and PartialEdit benchmarks. The work further uncovers that LLMs rely heavily on learned editing patterns for their judgments, revealing both their potential and limitations: while effective for familiar manipulations, they struggle to generalize to unseen editing strategies.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs), trained on large-scale text, have recently attracted significant attention for their strong performance across many tasks. Motivated by this, we investigate whether a text-trained LLM can help localize fake words in partially fake speech, where only specific words within a speech are edited. We build a speech LLM to perform fake word localization via next token prediction. Experiments and analyses on AV-Deepfake1M and PartialEdit indicates that the model frequently leverages editing-style pattern learned from the training data, particularly word-level polarity substitutions for those two databases we discussed, as cues for localizing fake words. Although such particular patterns provide useful information in an in-domain scenario, how to avoid over-reliance on such particular pattern and improve generalization to unseen editing styles remains an open question.

Problem

Research questions and friction points this paper is trying to address.

fake word localization

partially fake speech

large language models

editing style generalization

speech forensics

Innovation

Methods, ideas, or system contributions that make the work stand out.

large language models

fake word localization

partially fake speech