Haren pavan sai Nerella

About Me

Hello, I'm Pavan, a seasoned Data Engineer with over 5 years of hands-on experience in the IT industry.

Location

Folsom, CA

Tampa, FL

Education

University of South Florida-Main Campus

August 2019 - May 2021

degree

Master's

major

Computer Science

Jawaharlal Nehru Technological University

Work Experience

Fin Thrive

Data Engineer

Plano, TX, US

August 2023 - present

company

Fin Thrive

title

Data Engineer

overview

- Responsibilities - Responsible for the design, implementation, and architecture of very large-scale data intelligence solutions around big data platforms - Analyzed large and critical datasets using Hive and Zookeeper - Developed POC's using Spark, Scala and deployed on the Yarn Cluster, compared the performance of Spark, with Hive and SQL - Used Amazon Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) as storage mechanism - Collaborated with data scientists to document the development process, model configurations, and best practices for using generative AI technologies - Leveraged Azure Databricks to clean, transform, and analyze large datasets, providing actionable insights for business decision-making - Designed and implemented robust data architecture solutions using Postgres, Amazon Aurora, and DynamoDB to support high-throughput data processing and analysis - Capable of using AWS utilities such as EMR, S3, Glue crawler, ThoughtSpot, Lambda and Cloud Watch to run and monitor Hadoop and Spark jobs on AWS - Configured and customized Collibra workflows to streamline data cataloging, classification, and lineage tracking, improving data transparency and accessibility for clinical and administrative staff - Experience in developing Spark applications using Spark-SQL and PySpark in Databricks for data extraction, transformation, and aggregation from multiple file formats for Analyzing and transforming the data - Designed and developed complex ETL pipelines using AWS glue store, Snowflake's SQL and Snowflake's PySpark and JavaScript connectors, integrating data from various sources, including APIs, databases, and flat files - Developed reusable data flows in Azure Data Factory for common data processing tasks, promoting code maintainability, and reducing development time for future pipelines - Integrated Azure Data Factory with other Azure services such as Azure Synapse Analytics, Azure Databricks and Azure Analysis Services to create end-to-end data processing solutions - Spearheaded data governance initiatives that leveraged Collibra to support research and reporting activities, ensuring data compliance and integrity in healthcare studies - Designed and implemented interactive dashboards and reports in Power BI, providing real-time insights and data visualizations to support business decision-making - Experience in SQL query optimization for ThoughtSpot to ensure fast and efficient data retrieval - Leveraged Azure Data Factory's integration runtime to securely orchestrate data movement across hybrid environments, ensuring data governance and compliance - Led efforts to build and maintain data warehouse and data lake solutions, ensuring they scaled with evolving business requirements - Facilitated the integration of Collibra with electronic health record (EHR) systems, enabling seamless data exchange and enhancing patient care coordination - Independently identified and resolved complex issues within Hive and Spark applications - Developed and enforced data security policies, ensuring the confidentiality, integrity, and availability of sensitive data across cloud platforms - Environment: HDFS, Python, SQL, Spark, Azure Data Factory, Scala, Kafka, Hive, Yarn, Erwin Data Modeler, Sqoop, PySpark, TypeScript, Snowflake, GenAI, AWS Cloud, Glue, GitHub, Node.js, ThoughtSpot, Shell Scripting

CVS Health/AETNA

Data Quality Engineer

Hartford, CT, US

November 2021 - July 2023

J B Hunt Transportation

Data Engineer

Lowell, AR, US

May 2020 - October 2021

Airtel Gurugram Haryana

Big Data Engineer

March 2018 - August 2019

Fun Fact

Hello, I'm Pavan, a seasoned Data Engineer with over 5 years of hands-on experience in the IT industry. My expertise lies in Data Migration, Quality and Big Data technologies. I'm looking for new roles at this moment. I want to connect to know more about your career and share my experiences.

Passion

working to provide excellent products to customers for helping people

Skills

Access ControlsAdministrative OperationsAgile MethodologyAirflowAmazon DynamoDBAmazon Elastic Compute CloudAmazon S3Amazon Web ServicesAnalytical ThinkingApache HadoopApache HBaseApache HiveApache KafkaApache SparkApache YarnApache ZookeeperApplication Programming Interfaces (APIs)ArchitectureAutomationAWS GlueAzure Data FactoryAzure Machine LearningBig DataBigQueryBusiness IntelligenceBusiness RequirementsCataloguesClinical WorksCloud ComputingCloud ServicesCloud StorageCode ReviewCollibraComputer EngineeringConfidentialityCoordination SkillsCreating PrototypesDashboardsData AnalysisData ArchitectureDatabasesDatabricksData CachingData GovernanceData IngestionData IntelligenceData LakesData ManagementData MiningData ModelingData PipelinesData ProcessingData QualityData RetrievalData ScienceData SecurityData StreamingData TransmissionsData ValidationData VisualizationData WarehousingDecision Making SkillsDemographicsDevOpsDirected Acyclic Graph (Directed Graphs)Distributed Data StoreDistributed SystemsEclipse (Software)E-CommerceElectronic Data Interchange (EDI)Electronic Medical RecordsElectronicsEr-WinExtensible Markup Language (XML)Extract Transform Load (ETL)Generative AIGithubGoogle CloudHadoop Distributed File SystemHealth CareHealthcare SystemsIBM DB2Identity and Access ManagementIndexerInformation EngineeringInfrastructure ManagementIntelliJ IDEAJava (Programming Language)JavaScript (Programming Language)JenkinsKubernetesLinuxMachine LearningMaintenanceMapReduceMedical Care CoordinationMetadataMetricsMicrosoft AzureMicrosoft SQL ServerMicrosoft WindowsMobile Device ManagementModel View Controller (MVC)MongoDBMySQLNetwork PerformanceNode.JsNoSQLOperational SystemsOracle ApplicationsOracle DatabasesParallel ComputingPL-SQLPostgreSQLPower BIPredictive ModellingProfilingProgramming LanguagesPuTTYPysparkPython (Programming Language)Query OptimizationRecords ManagementReliabilityRequirements AnalysisResource AllocationSalesScalabilitySchedulingSecurity PoliciesSemi-structured DataShell ScriptSingleton PatternSnowflakeSoftware EngineeringSoftware Version ControlSQL DatabasesSqoopStakeholder ManagementStandardizationStrategic ThinkingStreamlineSubversionSystems Development Life CycleTableau (Software)Task ManagementTechnical DocumentationTelecommunicationsTeradata SQLTransportation ManagementTypeScriptUnixWorkflows

Volunteer

AIESEC Life

January 2017 - May 2019

Leadership

FInthrive

Sr. Data Engineer

August 2023 - September 2024

Hobbies

Kayaking

Cooking

Traveling