• Reduced clustering time by 75% for better real-time recognition by porting the core clustering module to Python; Optimized query-processing and semantic recognition for better understanding the customer pain points.
• Trained models using SimCSE algorithm to generate embeddings on business data with different tokenizing methods and TF-iDF; Improved recall from 0.2 to 0.65 compared to the original Word2Vec embeddings.
• Summarized additional positive/counter examples and developed a test set for sentence intent recognition to determine the best intent recognition rules; Reported detailed testing documentation to the team of 7 members.
Skills
Languages
ChineseEnglish
Technical skills
cc++gitjavascriptlinuxpython
Skills
DatabasesOperating Systems: Linux, UnixWeb Development