Sai Smithi Godi


Location

Hammonton, NJ
Rolla, MO
Education
    Missouri University of Science and Technology
    August 2022 - May 2024
    degree
    Master's
    major
    Computer Science
    CVR College of Engineering
Work Experience
    STORYLINE HEALTH
    Data Scientist
    Salt Lake, UT, US
    August 2024 - present
    company
    STORYLINE HEALTH
    title
    Data Scientist
    overview
    - Finetuned Open AI LLM model according to patient's data and built a conversational chatbot using LangChain Framework and Streamlit library. Created RAG pipelines, Retrievers, Agents and Chains for automation using Python in Jupyter Notebook - Made use of Flowise AI GUI for Loading, Splitting and Embedding documents into Vector DB (Pinecone, Chroma, FAISS) and deployed workflows for applications involving LLMs, specifically focusing on LangChain - Performed Exploratory Data Analysis (EDA) in JupyterLab over Electronic Health Records (EHR) to identify data patterns, clusters and correlation between features and visualized the outputs as reports and charts in Power BI - Implemented Feature Selection (Lasso and Ridge Regressors and PCA) and Predictive models (Random Forest, XGBoosting, ANN, ARIMA, SARIMA) using Python Libraries (Numpy, Panda, Scikit-Learn, Matplotlib, Seaborn, Statsmodels, OpenAI etc.) to predict - Tumor marker scores from 6k+ Audio features, and Prompt Engineering over OpenAI LLM for Feature Extraction - Collected data related to EHR features from various applications using Storyline APIs and stored them as RDB in AWS S3 - Constructed Data Architecture, Transformed and Modelled data from S3 using AWS Glue and stored them in Redshift - Automated this process using Lambda triggers and CloudWatch
    MISSOURI UNIVERSITY OF SCIENCE AND TECHNOLOGY
    Data Operator Alumni Relations
    Rolla, MO, US
    September 2022 - May 2024
    COGNIZANT TECHNOLOGY SOLUTIONS
    Data Engineer
    Hyderabad, IN-TG, IN
    August 2021 - August 2022
    LEAP ROBOTS
    Junior Data Engineer
    Hyderabad, IN-TG, IN
    December 2020 - July 2021
Skills
Agile MethodologyAirflowAlgorithmsAmazon DynamoDBAmazon Elastic Compute CloudAmazon RedshiftAmazon S3Amazon Web ServicesAnalysis of Variance (ANOVA)Analytical ThinkingAngularJSApache FlumeApache HadoopApache HBaseApache HiveApache KafkaApache MavenApache OozieApache SparkApache YarnApache ZookeeperApi DesignApplication Programming Interfaces (APIs)ArchitectureArtificial IntelligenceAutomationAWS GlueAWS LambdaBackup DevicesBayes' Theorem (Bayesian Statistics)Bootstrap (Software)Boto3Business AnalysisBusiness IntelligenceBusiness Process ImprovementCalculationsCascading Style Sheets (CSS)ChatbotsCloud ComputingCloudwatchCluster AnalysisContinuous IntegrationCriminal InvestigationCryptographyCursor (Graphical User Interface Elements)DashboardsData AnalysisData ArchitectureDatabase AdministrationDatabasesData CollectionData IngestionData IntegrityData MartData MigrationData ModelingData PipelinesData ProcessingData ScienceData StreamingData VisualizationData WarehousingDecision Making SkillsDemand ForecastingDemographicsDialectical Behavior TherapyDistributed File SystemsDockerEclipse (Software)E-CommerceElectronic Medical RecordsElectronicsEnterprise Resource PlanningEr-WinExpress.jsExtract Transform Load (ETL)Feature ExtractionFeature SelectionForecasting SkillsFundraisingGitGithubHealth CareHTMLImage ProcessingImporting and Exporting of GoodsInformation EngineeringInformation TechnologyInsurance Claim ProcessingJava Database ConnectivityJava (Programming Language)JavaScript (Programming Language)JenkinsJMP (Statistical Software)JQueryJSONJupyter NotebookKnowledge of EngineeringKnowledge of StatisticsKubernetesLarge Language ModelsLinear RegressionLinuxLog4jMachine LearningMapReduceMarket SegmentationMATLABMatplotlibMicroservicesMicrosoft AzureMicrosoft ExcelMongoDBMySQLNode.JsNoSQLNumPyOAuthOracle ApplicationsOracle DatabasesOracledbPandasPayment GatewayPL-SQLPostgreSQLPower BIPredictive Data AnalysisPredictive ModellingPrompt EngineeringPython (Programming Language)Random ForestReactJSRelational DatabasesResource ManagementRestful APIsRestocking ShelvesRisk AnalysisRobotics Design and ProductionR (Programming Language)SAS (Software)SchedulingScikit LearnSciPyScrap MetalsSeleniumSelenium WebdriverSensorsSimple Object Access Protocol (SOAP)SnowflakeSoftware Version ControlSpring-bootSQL DatabasesSQL Server Reporting ServicesSQL Stored ProceduresStakeholder EngagementStakeholder ManagementStatistical Hypothesis TestingStock ControlStrategic ThinkingSupply Chain ManagementTableau (Software)Technology StrategiesTelecommunicationsTime SeriesUnstructured DataVba Programming LanguageVisualizationWeb ArchitectureWeb TechnologiesWorkflowsXgboost