overview
- Responsibilities
- Responsible for maintaining quality data in source by performing operations like transformations, cleaning and ensuring integrity
- Designed and developed security framework to provide fine grained access to objects in S3 using AWS Lambda
- Provided end to end architecture implementation using the Amazon S3, lambda, step functions
- Optimizing the ETL workflows for performance, scalability, considering factors like parallelism and resource allocation
- Programmed in Shell to read, split, encrypt and write the data into S3 bucket locations
- Experienced in using AWS Cloud Watch to capture the workflow logs
- Implemented using AWS service Lambda as scheduler to automate spark applications on EMR cluster. Coded in python for Lambda
- Tableau is used for data visualization
- Datadog is used for real time monitoring the performance and event monitoring for cloud services and infrastructure
- Set up CI/CD pipeline to commit and maintain code
- Written Template for AWS Infrastructure as code using Terraform to build staging and production environments
- All the jobs are scheduled using Step functions and Event Bridge
- Optimized EMR fleets and instances for performance and cost reduction
- Cloud watch is used to collect and track metrics, collect and monitor log files and set alarms
- Used AWS Glue to create AWS Data Catalog to manage different data sources and transformations
- Environment: Hadoop, HDFS, SQL, Kafka, Spark, Shell, Python, AWS, EC2, EMR, Cloud Watch, GitLab, Tableau, Datadog, Athena, S3
- AWS Glue, Terraform, SQS, Cloud Watch, Event Bridge