• Streamlined data migration processes by employing Hive SQL to transition a legacy SQL codebase to Azure; achieved seamless data accessibility for 15+ analysts, improving overall data manipulation efficiency.
• Utilized SQL Server Integration Services (SSIS) to streamline the import and export of databases, designing 15+ ETL packages that decreased data load times by 40% while ensuring data integrity and consistency across multiple sources.
• Engineered a scalable ETL pipeline using Apache Spark on Hadoop, handling over 2TB of log data. Employed PySpark and Scala for transformations and incorporated Hive for structured data querying and storage, boosting data processing efficiency by 35%.
• Created Azure Data Factory pipelines to process and analyze datasets exceeding 1TB and designed interactive Tableau dashboards that improved decision-making speed by 30% through real-time project insights.
• Optimized MS SQL Server and MySQL databases, ensuring data integrity through advanced queries and stored procedures.
• Architected and enhanced Kubernetes-based, high-availability data pipelines, automating lifecycle with Jenkins CI/CD, reducing deployment time by 40% and improving processing performance by 25%.