- Improved data quality by 20% by implementing data cleansing and validation routines in the Data Build tool (DBT
- Boosted data loading speed by leveraging AWS Kinesis Firehose and S3 Partitioned Storage, improving data availability for downstream analytics
- Developed and deployed a high-performing ETL pipeline in Databricks to process terabytes of data daily, enabling efficient data
- Leveraged Apache Spark for large-scale data processing tasks, achieving a 20% reduction in processing time compared to traditional batch processing methods
- Accomplished end-to-end data engineering initiatives utilizing Python, PySpark, and distributed systems (Airflow, Databricks, AWS Redshift, Snowflake) to orchestrate robust and scalable data pipelines collaborating with cross-functional teams
- Automated efficient data pipelines that parsed and stored raw data into partitioned Hive tables, improving data retrieval for reporting and analysis by 20