· Developed a ML pipeline including data collections, data storage, data preprocessing, model training and data analysis using scraped data with Pytorch, AWS RDS and Lambda.
· Implemented data collection pipeline through multi-threaded web crawler using beautifulsoup in Python, extracting diverse advertisement topics from news articles; performed data validation to reduce error with Regex.
· Incorporated NLP techniques to construct word token and sentence token sentence embeddings for data preprocessing to improve accuracy and retrieval efficiency.
· Designed and implemented efficient data model and operations using AWS RDS for article data, connected to AWS Lambda, creating a multi-tier data classification model - XGBoost to decide topics for ad-tech/marketing.
· Integrated transformer using Pytorch for anomaly detection; Achieved a 10% accuracy improvement using XGBoost.