overview
- Responsibilities
- Designed scalable data solutions on Azure, leveraging services like Azure Data Lake Storage, Azure Synapse Analytics, and Azure Data Factory for efficient data processing
- Gathered configuration data using NetFlow, REST APIs, and PowerShell scripts, facilitating thorough analysis and storage in Azure Blob Storage or Azure SQL Database
- Adept in using PowerShell as the primary scripting language for automating and managing Azure resources, enabling efficient and repeatable deployment processes, as well as system configurations
- Implemented automated Emails using Azure Logic Apps, Azure Functions, and APIs for real-time data retrieval and storage in Azure Cosmos DB, enhancing analytical capabilities
- Deployed real-time data streaming solutions with Azure Event Hubs, Azure Kafka for handling continuous streams of network events and customer feedback
- Spearheaded batch data processing initiatives using Azure Data Factory and Azure Databricks, orchestrating seamless transfer and transformation of historical logs and configuration backups
- Developed and maintained data pipelines using Azure Data Factory (ADF) with a focus on leveraging both Azure Integration Runtime and Self-hosted Integration Runtime for efficient data movement and transformation across hybrid environments
- Implemented and scheduled ADF pipelines using various triggers (e.g., schedule, tumbling window, and event-based triggers) ensuring seamless integration and timely execution of ETL processes
- Leveraged Azure Databricks and Azure Data Factory, along with Synapse Analytics, for comprehensive ETL processes, including data cleansing, deduplication, normalization, and joins, utilizing PySpark for robust root cause analysis and anomaly detection
- Migrated and optimized ETL workflows from SQL Server Integration Services(SSIS) to Azure Data Factory, leveraging ADF's pipelines and data flows for improved scalability and efficiency
- Designed and implemented scalable data storage solutions using Azure Blob Storage, optimizing data retrieval and ensuring secure, cost-effective storage for large datasets
- Developed and optimized complex SQL queries and transformations in Azure Databricks, leveraging Spark SQL for efficient data processing and analytics across large datasets
- Pioneered dimensional modelling with star and snowflake schemas for multidimensional analysis of network operations and customer satisfaction metrics
- Instituted automated workflows and CI/CD pipelines to streamline model development, testing, and deployment processes
- Implemented data masking and anonymization strategies to safeguard sensitive information in compliance with GDPR and HIPAA regulations
- Proficient in working with Parquet and Delta formats, ensuring optimized storage and retrieval of large-scale datasets in big data environments, particularly within Azure ecosystems
- Orchestrated performance optimization strategies using Azure Data Factory and Databricks, and PySpark to fine-tune data processing pipelines and reduce latency
- Designed and implemented microservices architecture using Azure Service Fabric, ensuring high availability, scalability, and efficient resource management for distributed applications
- Implemented Role-Based Access Control (RBAC) using Azure Active Directory to secure and manage access to Azure resources, ensuring compliance and enhancing data security across the organization
- Dynamically scaled compute and storage resources with Azure Auto-scale to efficiently handle large volumes of data and minimize costs
- Employed query performance optimization techniques, including index optimization and query rewriting, to enhance data retrieval speeds
- Established data encryption at rest and in transit using Azure Key Vault and Azure Security Canter to protect sensitive data
- Configured comprehensive monitoring solutions using Azure Monitor and Azure Log Analytics to track network performance and detect anomalies effectively
- Used Azure Application Insights, Azure Diagnostics, and Azure Log Analytics for root cause analysis and performance tuning, and aggregated data with Azure Monitor and Syslog to optimize network throughput
- Utilized Delta Live Tables to process and analyze real-time data streams, enabling instant insights for time-critical decision-making
- Deep understanding of the SDLC process, implementing DevOps practices, including standard deployment processes (dev, test, prod) with peer-reviewed code. Experienced in managing CI/CD pipelines using Azure DevOps to streamline and automate software releases
- Facilitated cross-functional collaboration among network engineers, data scientists, and compliance officers using Microsoft Teams and Azure DevOps
- Documented network architecture diagrams, data flows, and security policies using Microsoft Visio and Azure Boards to ensure clear understanding and alignment across project stakeholders
- Established knowledge sharing sessions and documentation repositories on Azure DevOps and SharePoint to promote best practices and foster continuous learning among team members
- Led migration of on-premises data systems to Azure cloud, including architecture design, data transfer, and integration, ensuring seamless transition and optimized cloud performance
- Engineered and managed scalable ETL pipelines using PySpark within Azure Databricks, orchestrating complex data workflows through Directed Acyclic Graphs(DAGs) to optimize performance and resource utilization
- Developed and executed Spark Notebooks for data processing and analysis, leveraging Azure Databricks Clusters to handle large datasets and ensure efficient data transformation and aggregation
- Optimized data processing workflows in Azure Databricks by implementing advanced PySpark functions, utilizing Colacse and Repartitioning techniques to enhance the efficiency, performance, and scalability of ETL pipelines
- Designed and implemented scalable, high-performance databases using Azure Cosmos DB and PostgreSQL, optimizing data storage and retrieval for various applications and use cases
- Strong command of SQL, including T-SQL and PostgreSQL with extensive experience in writing, optimizing, and managing complex queries across various environments
- Engineered data pipelines with Change Data Capture (CDC) and Slowly Changing Dimensions (SCD) in Azure Data Factory, facilitating efficient tracking of data changes and managing historical data for accurate analytics and reporting
- Architected and managed data solutions using Azure Data Lake and Delta Lake, implementing Delta Live Tables within a Medallion architecture to ensure real-time data processing, incremental updates, and enhanced data quality and accessibility
- Implemented the Medallion Architecture in Azure Data Lake Storage, efficiently moving data through the bronze, silver, and gold layers to ensure a structured, clean, and high-quality dataset for advanced analytics and reporting
- Designed and developed interactive dashboards and reports in Power BI, enabling data-driven decision-making and providing actionable insights to stakeholders
- Engaged in Agile scrum meetings, including daily stand-ups and globally coordinated PI Planning, to ensure effective project management and execution