Maintained production ETL pipelines processing high-volume, multi-format data in Hadoop while ensuring stable, GDPR-
compliant data flows.
• Refactored Apache Airflow DAGs with retries and SLA monitoring, reducing pipeline failures and manual fixes by 50%.
• Investigated and resolved distributed Spark job failures through log tracing and cross-team coordination to restore critical
workflows.
• Implemented upstream validation checks to prevent incorrect data from propagating to final tables.
• Developed Python and SQL data-quality checks to detect issues early, reducing incident response time by 15% and
improving data availability.