Senior Data Engineer Big Data & Cloud Integration L3

Select Minds LLC

Senior Data Engineer Big Data & Cloud Integration L3

Dallas, TX
Full Time
Paid
  • Responsibilities

    Benefits:

    HYBRID

    Competitive salary

    Opportunity for advancement

    Senior Data Engineer – Big Data & Cloud Integration DALLAS, TX - HYRBID LONG TERM IN-PERSON INTERVIEW Data Engineer - L3 ROLE

    • Translate complex cross-functional business requirements and functional specifications into logical program designs and data solutions.
    • Partner with the product team to understand business needs and specifications.
    • Solve complex architecture, design and business problems.
    • Coordinate, Execute and participate in component integration (CIT) scenarios, system integration testing (SIT), and user acceptance testing (UAT) to identify application errors and to ensure quality software deployment.
    • Continuously work with cross-functional development teams (Data Analysts and Software Engineers) for creating PySpark jobs using Spark SQL and help them build reports on top of data pipelines.
    • Build, test and enhance data curation pipelines, integrate data from a wide variety of sources like DBMS, File systems and APIs for various OKRs and metrics development with high data quality and integrity.
    • Execute the development, maintenance, and enhancements of data ingestion solutions of varying complexity levels across various data sources like DBMS, File systems (structured and unstructured), APIs and Streaming on on-prem and cloud infrastructure.
    • Responsible for the design, implementation, and architecture of very large-scale data intelligence solutions around big data platforms.
    • Work with building data warehouse structures, and creating facts, dimensions, aggregate tables, by dimensional modeling, Star and Snowflake schemas.
    • Develop spark applications in PySpark on distributed environment to load huge number CSV files with different schema in to Hive ORC tables.
    • Perform ETL transformations on the data loaded into Spark Data Frames and do the in-memory computation.
    • Develop and implement data pipelines using AWS services such as Kinesis, S3 to process data in real-time.
    • Work with monitoring, logging and cost management tools that integrate with AWS.
    • Schedule the spark jobs using Airflow scheduler to monitor their performance.

    ....

    Flexible work from home options available.