Rithvik Reddy Pinninti

Location

Irving, TX

Education

Texas Tech University

August 2022 - December 2023

degree

Master's

major

Computer Science

Work Experience

Verizon

Data Engineer

Irving, TX, US

July 2023 - present

company

Verizon

title

Data Engineer

overview

- Responsibilities - Involved in Analysis, Design, and Implementation/translation of Business User requirements - Implemented ETL processes to transform and cleanse data as it moves between MySQL and NoSQL databases - Leveraged PySpark's capabilities for data manipulation, aggregation, and filtering to prepare data for further processing - Joined, manipulated, and drew actionable insights from large data sources using Python and SQL - Developed PySpark ETL pipelines to cleanse, transform, and enrich the raw data - Ingested large data streams from company REST APIs into EMR cluster through AWS kinesis - Integrated Amazon Kinesis and Apache Kafka for real-time event streaming and used Databricks for real-time analytics, reducing latency by 50 - Streamed data from AWS Fully Managed Kafka brokers using Spark Streaming and processed the data using explode transformations - Created data models and schema designs for Snowflake data warehouse to support complex analytical queries and reporting - Developed Hive queries for analysts by loading and transforming large sets of structured, semi-structured data using Hive - Automated resulting scripts and workflow using Apache Airflow and shell scripting to ensure daily execution in production - Used Spark SQL and Data Frames API to load structured and semi-structured data from MySQL tables into Spark Clusters - Built data ingestion pipelines (Snowflake staging) using disparate sources and other data formats to enable real-time data processing and analysis - Developed ETL workflows using AWS Glue to efficiently load big data sets into the data warehouse - Implemented Visualized BI Reports with Tableau - Administered users, user groups, and scheduled instances for reports in Tableau. Monitored Tableau Servers for their high availability to users - Deployed web-embedded Power BI dashboards refreshed using gateways by using workspace and data source - Leveraged SQL scripting for data modeling, enabling streamlined data querying and reporting capabilities, which contributed to improved insights into customer data - Collaborated with end-users to resolve data and performance-related issues during the onboarding of new users - Developed Airflow pipelines to efficiently load data from multiple sources into Redshift and monitored job schedules - Successfully migrated data from Teradata to AWS, improving data accessibility and cost efficiency - Worked on migrating the reports and dashboards from OBIEE to Power BI - Assisted multiple users from the data visualization team in connecting to Redshift using Power BI, Power Apps, Excel, Spotfire, Python - Extracted real-time feed using Kafka and Spark Streaming, converting it to RDD, and processed data in the form of Data Frame and saved the data as Parquet format in HDFS - Used Kubernetes to orchestrate the deployment, scaling, and management of Docker containers - Finalized the data pipeline using DynamoDB as a NoSQL storage option - Instantiated, created, and maintained CI/CD (continuous integration & deployment) pipelines and applied automation to environments and applications - Planned and executed data migration strategies for transferring data from legacy systems to MySQL and NoSQL databases - Actively participated in scrum meetings, reporting progress, and maintaining good communication with each team member and managers - Environment: Apache AirFlow, Kafka, Spark, MapReduce, Hadoop, Snowflake, Hive, Databricks, PySpark, Docker, Kubernetes, AWS, DynamoDB, CI/CD, Tableau, Redshift, Power BI, Rest APIs, Teradata, Windows

Southwest Airlines

Data Engineer

Dallas, TX, US

October 2022 - June 2023

Panasonic Peach tree City GA

Data Engineer

January 2020 - July 2022

Xoom Works

Data Engineer

Hyderabad, IN-TG, IN

June 2018 - December 2019

Skills

Administrative OperationsAgile MethodologyAirflowAmazon DynamoDBAmazon Elastic Mapreduce (EMR)Amazon RedshiftAmazon Relational Database ServiceAmazon Virtual Private Cloud (VPC)Amazon Web ServicesAnalytical ThinkingApache CassandraApache FlumeApache HadoopApache HBaseApache HiveApache KafkaApache OozieApache SparkApache YarnApple Mac SystemsApplication Programming Interfaces (APIs)Architectural DesignArtificial IntelligenceAutomationAWS GlueAzure Data FactoryAzure Data LakeBig DataBroadcastingBusiness DevelopmentBusiness RequirementsCachingCalculationsClickstreamCloud ComputingCode ReviewCommunication SkillsContinuous IntegrationCrudCustomer Data ManagementDashboardsData AnalysisData ArchitectureDatabase DesignDatabasesDatabricksData CleansingData CollectionData IngestionData LakesData MappingData MartData MigrationData MiningData ModelingData PipelinesData ProcessingData ProfilingData StreamingData ValidationData VisualizationData WarehousingDesign DocumentationDesign SpecificationsDirected Acyclic Graph (Directed Graphs)DockerEmployee OnboardingExtensible Markup Language (XML)Extract Transform Load (ETL)Extreme ProgrammingGithubGitlabHadoop Distributed File SystemIdentity and Access ManagementIndexerInformation EngineeringInformation TechnologyInfrastructure ManagementJava (Programming Language)JenkinsJIRAJSONKerberos (Protocol)KibanaKinesiologyKubernetesLegacy SystemsLinuxLooker AnalyticsMachine LearningMapReduceMasonry and Bricklaying WorkMicrosoft AccessMicrosoft AzureMicrosoft Certified ProfessionalMicrosoft ExcelMicrosoft SQL ServerMicrosoft WindowsMongoDBMySQLNetwork ServerNoSQLOperational SystemsOracle ApplicationsOracle Business Intelligence Enterprise EditionPerformance MonitorPipeline ReportingPL-SQLPostgreSQLPowerappsPower BIProblem SolvingProgramming LanguagesPysparkPython (Programming Language)Raw DataReal Time DataRelational DatabasesRelease ManagementReliabilityRequirements AnalysisRestful APIsS3 BucketSafety PrinciplesScalabilitySchedulingScrum MethodologySemi-structured DataShell ScriptSnowflakeSoftware EngineeringSpark StreamingSpotfireSQL DatabasesSQL Server Analysis ServicesSQL Server Integration ServicesSQL Stored ProceduresSqoopStandardizationStandard Template Library (STL)Stock ControlStorytellingStrategic ManagementStrategic ThinkingStream ProcessingSystem AvailabilityTableau (Software)Tax EfficiencyTeradata SQLTest-Driven Development (TDD)Testing SkillsText AnalysisUnixUnix ShellUnstructured DataUser AccountsUser Requirements DocumentsVirtual MachinesVisualizationWorkflowsXoom