π€ AI Summary
This study addresses the escalating privacy, security, and governance risks posed by the misuse of generative speech technologies, which existing threat models inadequately capture due to their complexity. To systematically characterize these risks, the work proposes VOICEβa novel, fine-grained risk taxonomy centered on Voice, Ownership, Identity, Control, and Expression. Drawing on a rich multi-source empirical corpus comprising 569 AI incidents, 1,067 user reports, and 2,221 Reddit discussions, the research integrates qualitative content analysis with threat modeling to elucidate how risks evolve in relation to exposure level, social visibility, and variations in legal protections. The resulting actionable classification framework offers policymakers, platform operators, and technologists an empirically grounded and structured foundation for effective governance and responsible design.
π Abstract
As generative voice models are rapidly advancing in both capabilities and public utilization, the unconsented collection, reuse, and synthesis of voice data are introducing new classes of privacy, security and governance risk that are poorly captured by existing, largely uniform threat models. To fill the gap, we present V.O.I.C.E, a taxonomy of voice generation risk grounded in a multi-source threat modeling effort with 569 incidents from major AI incident database, FTC and Internet Crime Complaint Center (IC3); 1067 direct incident reports from U.S. based participants across diverse groups (including voice actors, internet personalities, political personnel, and general public); and 2,221 Reddit discussions. Grounded in real-world data, our taxonomy explicitly models how risk emerges, interact with contextual factors such as degree of exposure, social visibility, and the availability of legal protections for various affected groups.