• Fine-tuned VITS, a Text-to-Speech deep learning model, on a 10-hour Mandarin-English multi-speaker dataset, improving 0.2 MOS (Mean Opinion Score) values through the manipulation of speaker embeddings, pitch, and tone.
• Developed automated data quality filtering and cleaning Python scripts using librosa and numpy, enabling bulk examination of inconsistent pronunciations, noise, or fractured sounds, and trimming silences.
• Replicated and experimented with over 6 existing Text-to-Speech models, including Fastspeech2 and YourTTS, on cloud servers, comparing performance across more than 4 metrics.
• Implemented Fernet encryption to ensure the security and confidentiality of Python codes and models. Deployed the fine-tuned models to a production environment and collaborated on updates via Git.
H
Human Computer Interaction Institute, CMU
Research Assistant
June 2022 - December 2022
S
School of Data Science, Fudan University
Researcher
Shanghai, CN
July 2021 - December 2022
S
School of Data Science, Fudan University
Researcher
Shanghai, CN
May 2022 - July 2022
C
Computer Science & Artificial Intelligence Lab, MIT
Research Assistant
April 2021 - June 2021
Skills
Languages
ChineseEnglish
Skills
CommunicationfigmaMicrosoft Excel, Microsoft Word, Microsoft PowerPoint, Prezi, Adobe Photoshop, IMovie, Youtube, Twitter, Facebook, LinkdinMicrosoft Office, Excel, Powerpoint, Visio,research surveysResearch WritingTechnical Writing