This is not a standard data engineer role.
Looking for a deeply technical, hands-on individual contributor who can:
- Diagnose performance, latency, and cost issues in a large-scale cloud data platform
- Take a top-down, platform-level view across multiple projects
- Improve architecture, efficiency, and cost optimization, not just write Spark code
- Act as a technical problem-solver and mentor, guiding other data engineers
This person is expected to make the platform better, not just execute tasks.
Current Platform & Architecture (Very Important)
Data Flow:
- On-premise systems → Cloud (Azure)
- Streaming ingestion → Azure Data Lake Storage (ADLS)
- Data processed into two separate containers:
- Crude trading
- Product trading
Technologies in Use:
- Qlik Replicate (formerly Attunity)
- Streaming data from on-prem to Azure
- Azure Data Lake Storage (ADLS)
- Databricks
- Delta Live Tables (DLT)
- Spark / PySpark
- Python
- SQL (complex queries and procedures)
Key Challenges the Role Is Meant to Solve
1\. Data Latency
- High-volume streaming data
- End-to-end latency issues that need root-cause analysis
2\. Databricks / DLT Cost Spikes
- DLT costs are far higher than expected
- Known contributors:
- Very high data volume (expected)
- Inefficient lookup logic used to split data into separate containers
- The current solution works but is not optimal
This role exists because generic recommendations are not enough.
What we do Not Want
- Someone who has:
- Only written Spark notebooks
- Only followed architectural guidance
- Only worked at a surface level
- Someone who:
- Needs strict 9–5 boundaries
- Avoids ambiguity or deep technical investigation
- Someone whose resume was “AI-polished” but not real
What WE DO WANT
Technical Depth
- Deep understanding of:
- Databricks internals
- Spark engine behavior
- Performance tuning and optimization
- Ability to:
- Analyze pipelines end-to-end
- Identify architectural inefficiencies
- Propose and prove better approaches via POCs
- Comfortable challenging Databricks as a product
- Gather evidence
- Support escalation discussions with Databricks engineers
Programming & Data Skills
- Strong Python (mandatory)
- PySpark (advanced, not basic)
- Advanced SQL
- Complex queries
- Stored procedures
- Analytical logic
Working Style
- Hands-on individual contributor
- Collaborative with data engineers
- Willing to:
- Review others’ solutions
- Build POCs independently
- Demonstrate better outcomes (performance, cost, scalability)
Role Scope
- Will work across multiple projects
- Acts as a cross-platform technical expert
- Evaluates:
- Architecture
- Cost drivers
- Scalability
- Reusability for future programs