Publications: 1. VoXtream: Full-Stream Text-to-Speech with Extremely Low Latency (Submitted to ICASSP 2026); 2. Reshape Dimensions Network for Speaker Recognition (Interspeech 2024); 3. VoxTube: a Multilingual Speaker Recognition Dataset (Interspeech 2023).
Research Experience
Worked as a Senior Machine Learning Engineer in the voice team at IDR&D Inc. for five years, focusing on speaker recognition and voice anti-spoofing.
Education
Master's: Applied Math and Computer Science from ITMO University; PhD: TMH, KTH Royal Institute of Technology, advised by Prof. Gustav Eje Henter and Prof. Gabriel Skantze. PhD funded by WASP.
Background
Research Interests: Conversational AI and speech synthesis, with a current focus on streaming TTS.