Job Description
The Service Delivery Manager (SDM) is a deeply technical, customer-facing leader responsible for ensuring world-class support delivery, operational alignment, and proactive technical guidance across Mirantis’ enterprise customer base. This hybrid role combines the technical depth of a Level 2 Support Engineer, the relationship skills of a Customer Success Manager or Account Executive, and the ownership mindset of a Technical Lead responsible for the customer’s full Mirantis platform stack. The SDM serves as the primary operational and technical owner for assigned accounts—leading escalations, guiding platform operations, and ensuring customers achieve maximum value from Mirantis technologies including Mirantis Kubernetes Engine(MKE), Mirantis Container Runtime (MCR), k0rdent, Lens, and OpenStack.
Key Responsibilities:
Technical Ownership & Support Leadership
- Serve as the primary technical authority for customer environments across Kubernetes, OpenStack, Linux, networking, storage, and security.
- Provide L2-level troubleshooting and technical guidance across compute, control plane, networking, and storage layers.
- Diagnose complex failures across OpenStack and Kubernetes components.
- Guide customers through upgrades, lifecycle management, capacity planning, and architecture best practices.
Excellence in Support Delivery
- Ensure customer issues are resolved within defined SLAs with minimal business impact.
- Maintain greater than 95% CSAT across assigned accounts.
- Conduct proactive platform reviews and drive root cause elimination for recurring issues.
Escalation Management
- Lead P1/P0 escalations, war rooms, and cross-functional incident response.
- Provide clear, timely updates to customers and internal stakeholders throughout the incident lifecycle.
- Prevent repeat incidents through structured RCA and strategic improvements.
Account Health & Adoption
- Conduct recurring platform health reviews and risk assessments.
- Drive modernization initiatives and adoption of MKE, MCR, k0rdent, and related Mirantis technologies.
- Partner closely with Customer Success Managers on retention, renewals, and expansion opportunities.
Must-Have Technical Literacy (Required
- OpenStack Architecture Literacy: Understanding Nova, Neutron, Cinder, Glance, Keystone and common failure patterns such as DHCP or metadata failures, RabbitMQ quorum issues, and Ceph latency impacting Nova.
- Cloud Control Plane & HA Concepts: Awareness of Galera clustering, HA control plane behavior, how outages manifest for customers, and sound judgment on triage and escalation.
- MOSK Awareness: Understanding Mirantis OpenStack for Kubernetes (MOSK) release cycles, containerized services, StackLight monitoring, and the differences between platform upgrades, host OS lifecycle, and kernel upgrades, including EOL risk.
- Ceph & Storage Fundamentals: High-level understanding of Ceph as block storage, how storage latency affects VM performance, and what OSD flapping, health warnings, or degraded clusters indicate.
- Neutron Networking Basics: Familiarity with provider versus tenant networks, floating versus internal IPs, and typical causes of east/west traffic failures, metadata issues, and DHCP problems.
- Incident Management Excellence: Ability to lead outage calls, structure communication, distinguish root cause from contributing factors, and drive follow-through to full resolution.
- Escalation Ownership: Skill in routing issues to the correct teams (OpenStack, Ceph/storage, networking, hardware, infrastructure) and maintaining accountability.
- Customer & Executive Communication: Ability to clearly explain what the issue is, why it matters, the risks, and next steps, maintaining trust and confidence.
- Lifecycle & Capacity Planning: Awareness of when customers are approaching software end-of-life, resource exhaustion, host overcommit, or aging hardware risk, and ability to recommend remediation
Nice-to-Have Knowledge (Preferred)
- Kubernetes Literacy : Understanding that many customers run Kubernetes on top of OpenStack, with basic familiarity with CSI, CNI, nodes, and pods to understand cross-platform dependencies.
- Performance Concepts : High-level awareness of CPU pinning, hugepages, and NUMA topology, sufficient to contextualize engineering recommendations.
- Broader Infrastructure Context : Familiarity with Octavia (load balancers), Designate (DNS), high availability patterns, and multi-region or FedRAMP-style constraints for regulated customers.
- Monitoring Awareness : Basic knowledge of StackLight dashboards and common alerts such as Ceph health issues, API failures, and RabbitMQ quorum problems to help interpret severity.