Papers: JetMoE: Reaching LLaMA2 Performance with 0.1M Dollars, OpenVoice: Versatile Instant Voice Cloning, MeloTTS: A high-quality multi-lingual multi-accent text-to-speech library, DreamVoice: Text-Guided Voice Conversion, MonoGRNet: A General Framework for Monocular 3D Object Detection; Awards: MyShell publicly listed on major crypto exchanges; Patents: Not explicitly mentioned; Projects: JetMoE-8B, OpenVoice, MyShell, MeloTTS, DreamVoice, MonoGRNet
Research Experience
Led the development of JetMoE-8B, pre-trained and post-trained from scratch under extreme limitations in compute and data, with less than 0.1M USD cost, outperforming LLaMA2-7B; Led the development of OpenVoice, an audio foundation model that allows users to clone any voice and generate speech in various styles and languages; Co-founded MyShell platform, which has 6 million users, more than 200,000 AI agents built, and over 1 billion interactions with AI agents.
Education
PhD: Massachusetts Institute of Technology (2020-2025); Visiting Researcher: Stanford University (2019), Mentor: Stanford Vision and Learning Lab; BE: Tsinghua University (2016-2020), Major: Electronic Engineering
Background
Research Interests: AI models, voice cloning, 3D computer vision; Professional Field: Electronic Engineering; Bio: Researcher and entrepreneur, primary author of several widely recognized AI models.