Peijin Li


Location

Arlington, VA
Education
    Georgetown University
    August 2021 - April 2023
    degree
    Master's
    major
    Data Science
    University of Electronic Science and Technology of China
Work Experience
    xDan AI
    AI Data Engineer Python
    July 2023 - present
    company
    xDan AI
    title
    AI Data Engineer Python
    overview
    - Primarily responsible for AI Data Synthesis and Alignment Training Algorithms, achieving automated data cleaning - Developed and implemented reusable ETL pipelines for processing large structured and unstructured datasets - Utilized the xDAN-Distilabel framework for question generation, enhancing High-Quality Generalization, diversity, and complexity from seed questions. Designed and implemented an Automated Model Evaluation System - Built an Automated AI Data Processing and quality assessment method using xDAN LLM models for handling noisy data - Managed and visualized large language model datasets using NOMIC on Hugging Face
    CGSP Georgetown University
    Data Analyst AND Visualization Specialist Python AWS
    October 2022 - July 2023
    Python Tableau R
    Data Scientist
    June 2022 - September 2022
    Hacking for Defense
    Data Analyst Python SQL
    March 2022 - May 2022
    Meritco Service
    Business Intelligence Analyst Excel Power BI
    April 2020 - July 2021
Skills
A/B TestingAlgorithmsAmazon Elastic Compute CloudAmazon Relational Database ServiceAmazon S3Amazon Web ServicesAnalysis of Variance (ANOVA)Apache HadoopApache SparkArtificial IntelligenceArtificial Neural NetworksAutomationBash ShellBayes' Theorem (Bayesian Statistics)Benchmarking SkillsBusiness IntelligenceCausal InferenceCluster AnalysisComputer ProgrammingDashboardsData AnalysisData CleansingData MiningData ProcessingData ScienceDecision TreesEconomyElectronicsExtract Transform Load (ETL)Forecasting SkillsGitHTMLInformation EngineeringInnovationJavaScript (Programming Language)KerasKnowledge of EngineeringKnowledge of FinanceKnowledge of StatisticsLarge Language ModelsLeadershipLogistic RegressionMachine LearningMacroeconomicsManufacturingMarket SegmentationMatplotlibMetricsMicrosoft ExcelMicrosoft OfficeMySQLNaive BayesNamed Entity RecognitionNumPyPandasPlatforms for LearningPlotlyPower BIPredictive ModellingPresentationsPublic PoliciesPysparkPython (Programming Language)Quality ManagementRandom ForestRestful APIsRisk AnalysisScikit LearnSelf MotivationSocial Network AnalysisSQL DatabasesStakeholder ManagementStatistical Hypothesis TestingSupport Vector MachineTableau (Software)User ExperienceVisualizationXgboost