IT Data Engineering JG3

United Global Technologies

IT Data Engineering JG3

Houston, TX
Full Time
Paid
  • Responsibilities

    This is not a standard data engineer role.

    Looking for a deeply technical, hands-on individual contributor who can:

    • Diagnose performance, latency, and cost issues in a large-scale cloud data platform
    • Take a top-down, platform-level view across multiple projects
    • Improve architecture, efficiency, and cost optimization, not just write Spark code
    • Act as a technical problem-solver and mentor, guiding other data engineers

    This person is expected to make the platform better, not just execute tasks.

    Current Platform & Architecture (Very Important)

    Data Flow:

    • On-premise systems → Cloud (Azure)
    • Streaming ingestion → Azure Data Lake Storage (ADLS)
    • Data processed into two separate containers:
      • Crude trading
      • Product trading

    Technologies in Use:

    • Qlik Replicate (formerly Attunity)
      • Streaming data from on-prem to Azure
    • Azure Data Lake Storage (ADLS)
    • Databricks
      • Delta Live Tables (DLT)
      • Spark / PySpark
    • Python
    • SQL (complex queries and procedures)

    Key Challenges the Role Is Meant to Solve

    1\. Data Latency

    • High-volume streaming data
    • End-to-end latency issues that need root-cause analysis

    2\. Databricks / DLT Cost Spikes

    • DLT costs are far higher than expected
    • Known contributors:
      • Very high data volume (expected)
      • Inefficient lookup logic used to split data into separate containers
    • The current solution works but is not optimal

    This role exists because generic recommendations are not enough.

    What we do Not Want

    • Someone who has:
      • Only written Spark notebooks
      • Only followed architectural guidance
      • Only worked at a surface level
    • Someone who:
      • Needs strict 9–5 boundaries
      • Avoids ambiguity or deep technical investigation
    • Someone whose resume was “AI-polished” but not real

    What WE DO WANT

    Technical Depth

    • Deep understanding of:
      • Databricks internals
      • Spark engine behavior
      • Performance tuning and optimization
    • Ability to:
      • Analyze pipelines end-to-end
      • Identify architectural inefficiencies
      • Propose and prove better approaches via POCs
    • Comfortable challenging Databricks as a product
      • Gather evidence
      • Support escalation discussions with Databricks engineers

    Programming & Data Skills

    • Strong Python (mandatory)
    • PySpark (advanced, not basic)
    • Advanced SQL
      • Complex queries
      • Stored procedures
      • Analytical logic

    Working Style

    • Hands-on individual contributor
    • Collaborative with data engineers
    • Willing to:
      • Review others’ solutions
      • Build POCs independently
      • Demonstrate better outcomes (performance, cost, scalability)

    Role Scope

    • Will work across multiple projects
    • Acts as a cross-platform technical expert
    • Evaluates:
      • Architecture
      • Cost drivers
      • Scalability
      • Reusability for future programs