Senior Cloud Data Operations Engineer with strong hands-on experience managing, and operating a FedRamp compliant AWS (GovCloud) data Platform
Responsibilities
- Support/Operate an Enterprise Data Services Platform (RedShift/EMR/OpenSearch Service)
- Built/enhance upon the current solution to ensure that is able to meet NFRs, service-level agreements (SLAs), and operational-level agreements
- Drive issue resolutions and root cause analysis with on-call duty
- Create and maintain runbooks for all operational processes
- Implement operational best practices for running data applications in the Cloud, and promote these best practices to the team and to the end-users
- Identify gaps and configure AWS services in use for optimal performance and user experience
- Automating processes/tasks to increase operational efficiency and customer experience
- Proactive monitoring and optimization of solutions
- Implement a proactive Monitoring & Alerting of potential bottlenecks to prevent/mitigate/resolve performance issues
- Implementing recovery strategies to execute in the event of cloud downtime or failures
- Troubleshoot/investigate data platform issues and implement workarounds/solutions in all environments (production/non-production)
- Identifying and implementing cost-saving strategies to reduce cloud operational costs
- Experience with enterprise rollout to production using CI/CD pipeline
Qualifications / Requirements Include:
- Solid hands-on experience deploying, managing, and operating a production data platform implemented in AWS GovCloud: Security, Identity & Compliance: AWS Identity & Access Management, AWS Key Management Service, AWS Secrets Manager, AWS Certificate Manager Management/Compliance: Amazon CloudWatch, AWS CloudTrail Analytics: Amazon RedShift, Amazon EMR, Amazon OpenSearch Service formerly known as ElasticSearch Service
- Ability to work independently and in a collaborative environment with cross-functional teams
- Strong client-facing experience
- Strong development experience in Python and Shell
- Knowledgeable of Spark processing engine & Big Data/Hadoop framework
- Experience implementing highly available and disaster Recovery data solutions in AWS
- The ability to discuss and explain system architecture and component design
- Hands-on experience with FedRAMP data applications built-in AWS GovCloud
- Working knowledge of Configuration management, Change management, and database administration
- Experience with application/IAC security testing tools is a plus
- Experience with installing/configuring and troubleshooting issues with SSL Certificates.
- Deploying & Provisioning environments
- Familiarity with Terraform & GitLab to maintain
Nice to have:
- Analytics: Amazon QuickSight, AWS Glue, Amazon Athena
- Databases: Amazon DynamoDB, AWS Database Migration Service, Amazon RDS
- Compute: AWS Lambda, Amazon EC2
- Storage: Amazon Simple Storage Service (S3)
- Strong development experience in Scala & Java
This is a long-term contract in San Francisco (hybrid/remote). MUST be a US citizen, or at least have I-140 to be eligible for employment with this company.