Venkat Sai Jagini

Location

Overland Park, KS

Education

University of Central Missouri

January 2012 - December 2013

degree

Master's

major

Computer Science

Vignan Institute Of Technology AND Science

Work Experience

JPMorgan Chase

Senior Azure Data Engineer

Plano, TX, US

January 2022 - present

company

JPMorgan Chase

title

Senior Azure Data Engineer

overview

- Responsibilities - Designed scalable data solutions on Azure, leveraging services like Azure Data Lake Storage, Azure Synapse Analytics, and Azure Data Factory for efficient data processing - Gathered configuration data using NetFlow, REST APIs, and PowerShell scripts, facilitating thorough analysis and storage in Azure Blob Storage or Azure SQL Database - Adept in using PowerShell as the primary scripting language for automating and managing Azure resources, enabling efficient and repeatable deployment processes, as well as system configurations - Implemented automated Emails using Azure Logic Apps, Azure Functions, and APIs for real-time data retrieval and storage in Azure Cosmos DB, enhancing analytical capabilities - Deployed real-time data streaming solutions with Azure Event Hubs, Azure Kafka for handling continuous streams of network events and customer feedback - Spearheaded batch data processing initiatives using Azure Data Factory and Azure Databricks, orchestrating seamless transfer and transformation of historical logs and configuration backups - Developed and maintained data pipelines using Azure Data Factory (ADF) with a focus on leveraging both Azure Integration Runtime and Self-hosted Integration Runtime for efficient data movement and transformation across hybrid environments - Implemented and scheduled ADF pipelines using various triggers (e.g., schedule, tumbling window, and event-based triggers) ensuring seamless integration and timely execution of ETL processes - Leveraged Azure Databricks and Azure Data Factory, along with Synapse Analytics, for comprehensive ETL processes, including data cleansing, deduplication, normalization, and joins, utilizing PySpark for robust root cause analysis and anomaly detection - Migrated and optimized ETL workflows from SQL Server Integration Services(SSIS) to Azure Data Factory, leveraging ADF's pipelines and data flows for improved scalability and efficiency - Designed and implemented scalable data storage solutions using Azure Blob Storage, optimizing data retrieval and ensuring secure, cost-effective storage for large datasets - Developed and optimized complex SQL queries and transformations in Azure Databricks, leveraging Spark SQL for efficient data processing and analytics across large datasets - Pioneered dimensional modelling with star and snowflake schemas for multidimensional analysis of network operations and customer satisfaction metrics - Instituted automated workflows and CI/CD pipelines to streamline model development, testing, and deployment processes - Implemented data masking and anonymization strategies to safeguard sensitive information in compliance with GDPR and HIPAA regulations - Proficient in working with Parquet and Delta formats, ensuring optimized storage and retrieval of large-scale datasets in big data environments, particularly within Azure ecosystems - Orchestrated performance optimization strategies using Azure Data Factory and Databricks, and PySpark to fine-tune data processing pipelines and reduce latency - Designed and implemented microservices architecture using Azure Service Fabric, ensuring high availability, scalability, and efficient resource management for distributed applications - Implemented Role-Based Access Control (RBAC) using Azure Active Directory to secure and manage access to Azure resources, ensuring compliance and enhancing data security across the organization - Dynamically scaled compute and storage resources with Azure Auto-scale to efficiently handle large volumes of data and minimize costs - Employed query performance optimization techniques, including index optimization and query rewriting, to enhance data retrieval speeds - Established data encryption at rest and in transit using Azure Key Vault and Azure Security Canter to protect sensitive data - Configured comprehensive monitoring solutions using Azure Monitor and Azure Log Analytics to track network performance and detect anomalies effectively - Used Azure Application Insights, Azure Diagnostics, and Azure Log Analytics for root cause analysis and performance tuning, and aggregated data with Azure Monitor and Syslog to optimize network throughput - Utilized Delta Live Tables to process and analyze real-time data streams, enabling instant insights for time-critical decision-making - Deep understanding of the SDLC process, implementing DevOps practices, including standard deployment processes (dev, test, prod) with peer-reviewed code. Experienced in managing CI/CD pipelines using Azure DevOps to streamline and automate software releases - Facilitated cross-functional collaboration among network engineers, data scientists, and compliance officers using Microsoft Teams and Azure DevOps - Documented network architecture diagrams, data flows, and security policies using Microsoft Visio and Azure Boards to ensure clear understanding and alignment across project stakeholders - Established knowledge sharing sessions and documentation repositories on Azure DevOps and SharePoint to promote best practices and foster continuous learning among team members - Led migration of on-premises data systems to Azure cloud, including architecture design, data transfer, and integration, ensuring seamless transition and optimized cloud performance - Engineered and managed scalable ETL pipelines using PySpark within Azure Databricks, orchestrating complex data workflows through Directed Acyclic Graphs(DAGs) to optimize performance and resource utilization - Developed and executed Spark Notebooks for data processing and analysis, leveraging Azure Databricks Clusters to handle large datasets and ensure efficient data transformation and aggregation - Optimized data processing workflows in Azure Databricks by implementing advanced PySpark functions, utilizing Colacse and Repartitioning techniques to enhance the efficiency, performance, and scalability of ETL pipelines - Designed and implemented scalable, high-performance databases using Azure Cosmos DB and PostgreSQL, optimizing data storage and retrieval for various applications and use cases - Strong command of SQL, including T-SQL and PostgreSQL with extensive experience in writing, optimizing, and managing complex queries across various environments - Engineered data pipelines with Change Data Capture (CDC) and Slowly Changing Dimensions (SCD) in Azure Data Factory, facilitating efficient tracking of data changes and managing historical data for accurate analytics and reporting - Architected and managed data solutions using Azure Data Lake and Delta Lake, implementing Delta Live Tables within a Medallion architecture to ensure real-time data processing, incremental updates, and enhanced data quality and accessibility - Implemented the Medallion Architecture in Azure Data Lake Storage, efficiently moving data through the bronze, silver, and gold layers to ensure a structured, clean, and high-quality dataset for advanced analytics and reporting - Designed and developed interactive dashboards and reports in Power BI, enabling data-driven decision-making and providing actionable insights to stakeholders - Engaged in Agile scrum meetings, including daily stand-ups and globally coordinated PI Planning, to ensure effective project management and execution

Cisco

Azure Data Engineer

San Jose, CA, US

May 2018 - December 2021

Cigna Healthcare

Bigdata Developer

Bloomfield, CT, US

February 2016 - April 2018

Nike

SQL developer/Data Warehouse Developer

Beaverton, OR, US

January 2014 - January 2016

Skills

Access ControlsActive DirectoryActivities of Daily LivingAdobe InDesignAgile MethodologyAmazon S3Analytical ThinkingAnomaly DetectionAnonymizationApache AntApache FlumeApache HadoopApache HBaseApache HiveApache KafkaApache MavenApache OozieApache SparkApache YarnApache ZookeeperApplication Programming Interfaces (APIs)Architectural DesignArchitectureAutomationAvroAzure Active DirectoryAzure Data FactoryAzure Data LakeAzure Machine LearningAzure Service FabricBackup DevicesBash ShellBig DataBitbucketBuffersBuilding CodesBuild ToolsBusiness Analytics ApplicationsBusiness LogicBusiness ProcessesBusiness Process ImprovementBusiness RequirementsBusiness StrategiesCassandraCentOSChange Data CaptureCiscoCloud ComputingClouderaCloudera ManagerCloud ServicesCloud StorageCodebaseCommunication SkillsComputer Network OperationsComputer NetworksContinuous DeliveryContinuous IntegrationCoordination SkillsCryptographyCursor (Graphical User Interface Elements)Customer SatisfactionCustomer ServiceDashboardsData AnalysisDatabase AdministrationDatabase ModelsDatabasesDatabase Storage StructuresDatabricksData CleansingData DeduplicationData Definition LanguageData DeliveryData DiscoveryData GovernanceData IngestionData IntegrationData IntegrityData LakesData LoggingData ManagementData Manipulation LanguagesData MartData MaskingData MigrationData MiningData PipelinesData ProcessingData QualityData RetentionData RetrievalData SecurityData Storage TechnologiesData StoresData StreamingData SystemsData TransformationData TransmissionsData VisualizationData WarehousingDecision Making SkillsDemand ForecastingDependency ManagementDevOpsDiagnostic SkillsDirected Acyclic Graph (Directed Graphs)Distributed SystemsDockerEclipse (Software)EcosystemsEmail NotificationsEr-WinExtensible Markup Language (XML)Extract Transform Load (ETL)Extreme ProgrammingFiling SkillsFluid PipesGitGithubGitlabGovernanceGraphqlHadoop Distributed File SystemHard Work and DedicationHdinsightHealth CareHealth Insurance Portability and Accountability Act ComplianceIdentity and Access ManagementInformatica PowercenterInformation EngineeringInformation TechnologyInfrastructure as a Service (IaaS)Infrastructure ManagementJava (Programming Language)JenkinsJIRAJSONKerberos (Protocol)Key Performance IndicatorsKnowledge of FinanceLinuxMaintenanceManagement of StressManagement SystemsManufacturingManufacturing ProcessesMapReduceMetricsMicroservicesMicrosoft AccessMicrosoft AzureMicrosoft Certified ProfessionalMicrosoft SharePointMicrosoft SQL ServerMicrosoft TeamsMicrosoft VisioMicrosoft Visual SourceSafeMicrosoft Visual StudioMicrosoft WindowsModelling SkillsModularizationMongoDBMySQLNetFlowNetwork ArchitectureNetwork PerformanceNetwork SecurityNode.JsNormalization ProcessesNoSQLNotepad (Software)Offshore WorkOnline Analytical ProcessingOnline Transaction ProcessingOperational SystemsOracle ApplicationsOracle DatabasesOutdoor RecreationPerformance TuningPersistent Data StructurePlatform as a Service (PAAS)PL-SQLPostgreSQLPower BIProblem SolvingProcess AnalysisProgramming LanguagesProject ManagementProject Management Life CyclePysparkPython (Programming Language)Query PerformanceReal Time DataRelational DatabasesRequirements AnalysisResource AllocationResource ManagementResource UtilizationRestful APIsRetail CommerceRole-Based Access ControlRoot Cause AnalysisSafety PrinciplesScalabilitySchedulingScriptingScrum MethodologySecurity PoliciesServerless ComputingShell ScriptSnowflakeSoftware DebuggingSoftware Exception HandlingSoftware Version ControlSpark StreamingSQL AzureSQL DatabasesSQL Server Analysis ServicesSQL Server Integration ServicesSQL Server Reporting ServicesSQL Stored ProceduresSqoopStaging AreaStakeholder ManagementStorage SystemsStrategic ThinkingStream AnalyticsStreamlineStream ProcessingSuccess Driven PersonSync (Unix)SyslogSystem AvailabilitySystems Development Life CycleTableau (Software)Tcl (Programming Language)Team WorkingTeradata SQLTerraformTesting SkillsTime SeriesTransact-SQLTrelloTumblingUbuntu (Operating System)UnixUnstructured DataUser AuthenticationUser Defined FunctionsUser StoriesVirtual MachinesWaterfall ModelWindows PowerShellWorkflows