Job Description
BASIC QUALIFICATIONS:
Bachelor’s degree in Engineering, Computer Science, Statistics, Applied Math, or a related technical field is required plus a minimum of 8 years of relevant experience; or Master's degree plus a minimum of 6 years of relevant experience. Agile experience preferred
- Working knowledge of entity resolution systems
- Experience with messages systems like Kafka
- Experience with NoSQL and/or graph databases like MongoDB or ArangoDB
- Any of the following databases: SQL, MongoDB, Oracle, Postgres
- Working experience with ETL processing
- Working experience with data workflow products like StreamSets or NiFi
- Working experience with Python RESTful API services, JDBC
- Experience with Hadoop and Hive/Impala
- Experience with Cloudera Data Science Workbench is a plus
- Understanding of pySpark Leadership experience
- Creative thinker
- Ability to multi-task
- Excellent use and understanding of data engineering concepts, principles, and theories
CLEARANCE REQUIREMENTS:
A TS/SCI security clearance with the ability to obtain a Polygraph is required at time of hire. Candidates must be able to obtain the Polygraph within a reasonable amount of time from date of hire. Applicants selected will be subject to a U.S. Government security investigation and must meet eligibility requirements for access to classified information. Due to the nature of work performed within our facilities, U.S. citizenship is required.
RESPONSIBILITIES:
- Support data science team by designing, developing and implementing scalable ETL process for disparate datasets into a Hadoop infrastructure
- Design, develop, implement and maintain data ingestion process from various disparate datasets using StreamSets (experience with StreamSets not mandatory)
- Develop processes to identify data drift and malformed records
- Develop technical documentation and standard operating procedures
- Mentor new and junior data engineers
- Leads technical tasks for small teams or projects