Site Reliability engineer

IKR Enterprises

Site Reliability engineer

San Francisco, CA
Full Time
Paid
  • Responsibilities

    SITE RELIABILITY ENGINEER

    San Francisco, CA or New York, NY | Full-time | In-Office

    \-------------------------------------------------------------

    COMPENSATION

    Base: $200,000--$300,000

    Equity: Competitive

    \-------------------------------------------------------------

    WHY THIS ROLE

    This clinical AI company works with dozens of the nation's leading health systems and helps millions of patients annually get faster access to medication, and reliability on this platform is directly tied to patient outcomes. Well-funded and at 75 people, the company has the engineering maturity of a much larger team: a thoughtful tech stack (Kubernetes, Terraform, OpenTelemetry, Honeycomb), clear SLO-driven operations, and an SRE function that ships code, not just configuration. At this stage, your architectural decisions carry company-wide weight. This is a role for an application-leaning SRE who wants real ownership, not a team of 500 to hide in.

    \-------------------------------------------------------------

    ABOUT THE ROLE

    You'll own the full production environment and improve the development experience by enhancing both infrastructure and application reliability for a clinical AI platform. This is not a pure infra role, you'll be expected to contribute performance and reliability improvements directly to application code, lead incident response with a bias toward durable fixes, and drive SLO-aligned engineering outcomes. Reporting structure and team composition available during the interview process.

    \-------------------------------------------------------------

    REQUIREMENTS

    Experience: 7+ years as a highly technical, application-leaning Site Reliability Engineer

    \- Prior experience with 500+ machine deployments

    \- Deep expertise in Kubernetes and Helm (deployment, scaling, operational health)

    \- CI/CD optimization across TypeScript and Python/ML pipelines

    \- Infrastructure as Code using Terraform

    \- Ability to define, implement, and evolve SLIs and SLOs

    \- OpenTelemetry traces, metrics, and events/logs standardization

    \- Performance and scalability diagnosis from trace and metrics data

    \- Strong incident response skills with a bias toward durable, code-level fixes

    \- Comfortable contributing reliability improvements directly to application code

    Tech Stack: Python, TypeScript, Kubernetes, Terraform, OpenTelemetry, Honeycomb

    Visa Sponsorship: Available — all types except net new H-1Bs

    \-------------------------------------------------------------

    LOGISTICS

    Location: San Francisco, CA (Financial District) or New York, NY (Midtown)

    Work Policy: In-office, 5 days per week