Primary Responsibilities:
- Build, maintain and continuously improve a scalable, flexible, performant data pipeline and associated services and infrastructure.
- Responsible for building end to end process to establish the data pipeline for the various data sources.
- Implement data integration, preparation processes including profiling, cleansing and enrichment.
- Implement data storage and access mechanisms.
- Build and maintain successful collaborative relationships with groups outside of engineering including Data Science, Deployments, Product Management, Operations, and Customer Success.
- Grow or transform the current data platform that feeds our products to incrementally build out our technical vision, adapt to new customer needs, and enable a sustainable innovation cycle across all stages of the pipeline.
Minimum Qualifications:
• BS in Computer Science or a related field
• 6+ years software engineering
• Experience with big data applications and data transformation workflows
• Experience in Agile development practices (e.g. Scrum, etc.)
Must have expertise in some of the following:
• AWS
• Amazon EMR
• Amazon Redshift
• Python 3
• RDBMS (PostgreSQL, SQL Server, Oracle, DB2, etc.), being proficient with SQL
• SQL query and performance tuning
• Apache Spark
• Runtime Configuration Management (e.g. 12 Factor Apps)
• Continuous Build Systems (Jenkins, Bamboo, etc.)
• Containers and related technologies (Docker, Kubernetes, AWS ECS, etc.)