Haren pavan sai Nerella


About Me

Hello, I'm Pavan, a seasoned Data Engineer with over 5 years of hands-on experience in the IT industry.

Location

Folsom, CA
Tampa, FL
Education
    University of South Florida-Main Campus
    August 2019 - May 2021
    degree
    Master's
    major
    Computer Science
    Jawaharlal Nehru Technological University
Work Experience
    Fin Thrive
    Data Engineer
    Plano, TX, US
    August 2023 - present
    company
    Fin Thrive
    title
    Data Engineer
    overview
    - Responsibilities - Responsible for the design, implementation, and architecture of very large-scale data intelligence solutions around big data platforms - Analyzed large and critical datasets using Hive and Zookeeper - Developed POC's using Spark, Scala and deployed on the Yarn Cluster, compared the performance of Spark, with Hive and SQL - Used Amazon Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) as storage mechanism - Collaborated with data scientists to document the development process, model configurations, and best practices for using generative AI technologies - Leveraged Azure Databricks to clean, transform, and analyze large datasets, providing actionable insights for business decision-making - Designed and implemented robust data architecture solutions using Postgres, Amazon Aurora, and DynamoDB to support high-throughput data processing and analysis - Capable of using AWS utilities such as EMR, S3, Glue crawler, ThoughtSpot, Lambda and Cloud Watch to run and monitor Hadoop and Spark jobs on AWS - Configured and customized Collibra workflows to streamline data cataloging, classification, and lineage tracking, improving data transparency and accessibility for clinical and administrative staff - Experience in developing Spark applications using Spark-SQL and PySpark in Databricks for data extraction, transformation, and aggregation from multiple file formats for Analyzing and transforming the data - Designed and developed complex ETL pipelines using AWS glue store, Snowflake's SQL and Snowflake's PySpark and JavaScript connectors, integrating data from various sources, including APIs, databases, and flat files - Developed reusable data flows in Azure Data Factory for common data processing tasks, promoting code maintainability, and reducing development time for future pipelines - Integrated Azure Data Factory with other Azure services such as Azure Synapse Analytics, Azure Databricks and Azure Analysis Services to create end-to-end data processing solutions - Spearheaded data governance initiatives that leveraged Collibra to support research and reporting activities, ensuring data compliance and integrity in healthcare studies - Designed and implemented interactive dashboards and reports in Power BI, providing real-time insights and data visualizations to support business decision-making - Experience in SQL query optimization for ThoughtSpot to ensure fast and efficient data retrieval - Leveraged Azure Data Factory's integration runtime to securely orchestrate data movement across hybrid environments, ensuring data governance and compliance - Led efforts to build and maintain data warehouse and data lake solutions, ensuring they scaled with evolving business requirements - Facilitated the integration of Collibra with electronic health record (EHR) systems, enabling seamless data exchange and enhancing patient care coordination - Independently identified and resolved complex issues within Hive and Spark applications - Developed and enforced data security policies, ensuring the confidentiality, integrity, and availability of sensitive data across cloud platforms - Environment: HDFS, Python, SQL, Spark, Azure Data Factory, Scala, Kafka, Hive, Yarn, Erwin Data Modeler, Sqoop, PySpark, TypeScript, Snowflake, GenAI, AWS Cloud, Glue, GitHub, Node.js, ThoughtSpot, Shell Scripting
    CVS Health/AETNA
    Data Quality Engineer
    Hartford, CT, US
    November 2021 - July 2023
    J B Hunt Transportation
    Data Engineer
    Lowell, AR, US
    May 2020 - October 2021
    Airtel Gurugram Haryana
    Big Data Engineer
    IN
    March 2018 - August 2019
Fun Fact

Hello, I'm Pavan, a seasoned Data Engineer with over 5 years of hands-on experience in the IT industry. My expertise lies in Data Migration, Quality and Big Data technologies. I'm looking for new roles at this moment. I want to connect to know more about your career and share my experiences.

Passion

working to provide excellent products to customers for helping people

Skills
Access ControlsAdministrative OperationsAgile MethodologyAirflowAmazon DynamoDBAmazon Elastic Compute CloudAmazon S3Amazon Web ServicesAnalytical ThinkingApache HadoopApache HBaseApache HiveApache KafkaApache SparkApache YarnApache ZookeeperApplication Programming Interfaces (APIs)ArchitectureAutomationAWS GlueAzure Data FactoryAzure Machine LearningBig DataBigQueryBusiness IntelligenceBusiness RequirementsCataloguesClinical WorksCloud ComputingCloud ServicesCloud StorageCode ReviewCollibraComputer EngineeringConfidentialityCoordination SkillsCreating PrototypesDashboardsData AnalysisData ArchitectureDatabasesDatabricksData CachingData GovernanceData IngestionData IntelligenceData LakesData ManagementData MiningData ModelingData PipelinesData ProcessingData QualityData RetrievalData ScienceData SecurityData StreamingData TransmissionsData ValidationData VisualizationData WarehousingDecision Making SkillsDemographicsDevOpsDirected Acyclic Graph (Directed Graphs)Distributed Data StoreDistributed SystemsEclipse (Software)E-CommerceElectronic Data Interchange (EDI)Electronic Medical RecordsElectronicsEr-WinExtensible Markup Language (XML)Extract Transform Load (ETL)Generative AIGithubGoogle CloudHadoop Distributed File SystemHealth CareHealthcare SystemsIBM DB2Identity and Access ManagementIndexerInformation EngineeringInfrastructure ManagementIntelliJ IDEAJava (Programming Language)JavaScript (Programming Language)JenkinsKubernetesLinuxMachine LearningMaintenanceMapReduceMedical Care CoordinationMetadataMetricsMicrosoft AzureMicrosoft SQL ServerMicrosoft WindowsMobile Device ManagementModel View Controller (MVC)MongoDBMySQLNetwork PerformanceNode.JsNoSQLOperational SystemsOracle ApplicationsOracle DatabasesParallel ComputingPL-SQLPostgreSQLPower BIPredictive ModellingProfilingProgramming LanguagesPuTTYPysparkPython (Programming Language)Query OptimizationRecords ManagementReliabilityRequirements AnalysisResource AllocationSalesScalabilitySchedulingSecurity PoliciesSemi-structured DataShell ScriptSingleton PatternSnowflakeSoftware EngineeringSoftware Version ControlSQL DatabasesSqoopStakeholder ManagementStandardizationStrategic ThinkingStreamlineSubversionSystems Development Life CycleTableau (Software)Task ManagementTechnical DocumentationTelecommunicationsTeradata SQLTransportation ManagementTypeScriptUnixWorkflows
Volunteer
    AIESEC Life
    January 2017 - May 2019
Leadership
    FInthrive
    Sr. Data Engineer
    August 2023 - September 2024
Hobbies
Kayaking
Cooking
Traveling