Data Scientist @Mercatus Center|DS Master @Georgetown University|Looking for Data Scientist/Analyst, Machine Learning Engineer full-time position after May 2023
Location
Arlington, VA
Education
G
Georgetown University
August 2021 - May 2023
degree
Master's
major
Computer Science
coursework
Machine Learning, Deep Learning (NLP, Computer Vision), Big Data, Cloud Computing
S
SUN YAT-SEN UNIVERSITY
September 2016 - June 2021
Work Experience
Joblogic-X, Data Scientist
Plano, TX, US
May 2021 - September 2022
title
Joblogic-X, Data Scientist
overview
- Credit Risk Analysis for Short-Term Loan Applications
- Responsibilities include writing complex SQL queries for data extraction within a real-time Big Data environment, building fully automated ETL data pipelines to ensure data quality and integrity, applying advanced machine learning techniques
- Built fully automated ETL data pipelines with Python(Airflow) and complex SQL joining and aggregating data from multiple
- Applied advanced data cleaning techniques to handle missing values, outliers, distorted data distributions, etc. and ingest data under heavy workloads to ensure data integrity and consistency, improving data accuracy by 47
- Leveraged modern machine learning models (Logistic Regression, Support Vector Machine(SVM), Decision Tree with Pruning, Random Forest, Boosted Tree, Artificial Neural Network, etc.) to predict the probability of loan default and determine the optimal pricing terms/strategies balancing profitability and risk management. Models are being used in daily production and help the client company drive $11+ MM productivity annually
- Developed centralized interactive dashboarding capability that summarized loan default behavior and risk management
- Recommendation for Store Opening Site Locations for Meet Fresh
- Developed robust ETL data pipelines extracting and merging data through multiple channels (Google Map Geocoding & Routing Service, Yelp API, SQL, CSV, etc.) 24/7 with zero human intervention, automation helps reduce manual labor by 97
- Cleansed and preprocessed raw data, followed by feature engineering on 100+ categorical features and numeric features
- Built clustering model(Mini-Batch K-Means, DBSCAN), followed by model selection using performance metrics and identified potentially profitable locations. The model was utilized by the client company in deciding on locations of 20+ new
K
KPMG
Data Consulting and Digital Transformation Summer Intern