🤖 AI Summary
This study addresses weak self-perception and insufficient expressiveness in online public speaking by proposing a learnable, evolvable AI digital cloning framework that achieves high-fidelity modeling of users’ voice, pose, and expressive style. Methodologically, it integrates multimodal representation learning, behavioral cloning, speech-pose co-generation, and self-supervised feedback reinforcement to establish a closed-loop training mechanism for dynamic optimization of both cloning fidelity and feedback responsiveness. Its key contribution lies in being the first to deeply embed personalized digital cloning into a public-speaking training loop, enabling real-time, individualized self-image simulation and metacognitive feedback. Experimental results on public speaking assessment tasks demonstrate an average 37% improvement in user performance scores and a 42% increase in self-perception accuracy—significantly outperforming existing baseline methods.