Python Data Engineer

HJ Staffing

Python Data Engineer

National
Full Time
Paid
  • Responsibilities

    We are seeking a highly skilled Python Data Engineer with deep experience in CMS datasets (MOR, MMR, MAO) and a strong understanding of healthcare regulations and compliance standards (HIPAA). This role is ideal for a data-driven professional who thrives in cloud-native environments and is passionate about building robust, scalable, and efficient pipelines that drive healthcare innovation.

    Key Responsibilities:

    • Design, develop, and maintain scalable ETL pipelines for CMS datasets using GCP Dataflow (Apache Beam) and Python

    • Architect and manage BigQuery data warehouses, ensuring optimal performance and cost-efficiency

    • Implement and manage Airflow DAGs for workflow orchestration and scheduling

    • Ensure end-to-end data quality, lineage, validation, and governance in alignment with HIPAA and CMS standards

    • Optimize large-scale healthcare datasets using partitioning, clustering, sharding, and efficient query patterns in BigQuery

    • Collaborate within Agile teams using tools like Jira and Confluence for sprint planning and documentation

    • Monitor, troubleshoot, and improve pipeline reliability and performance across the full data lifecycle

    Qualifications:

    • Bachelor's degree in Computer Science, Information Systems, or related field

    • 3+ years of experience in cloud-based data engineering, preferably with healthcare datasets

    • Strong proficiency in Python, GCP Dataflow, and Apache Beam

    • Expert-level knowledge in BigQuery, including schema design, performance tuning, and advanced SQL

    • Hands-on experience with Airflow forthe orchestration of complex data workflows

    • In-depth understanding of data warehouse design, including star/snowflake schemas, normalization, and denormalization

    • Strong analytical skills for query and data optimization

    • Familiarity with Agile methodologies and collaboration tools (Jira, Confluence)

    • Knowledge of CMS datasets (MOR, MMR, MAO) and healthcare data privacy/compliance standards (HIPAA)