Site Reliability Engineer

Oteemo, Inc

Site Reliability Engineer

San Diego, CA
Full Time
Paid
  • Responsibilities

    Job Description

    This position will primarily focus on providing design and implementation expertise on infrastructure provisioning, management and lifecycle implementation of cloud components and services, containers and other critical concepts of DevSecOps principles.

    Key Responsibilities:

    • Observability & Monitoring: Design and manage monitoring solutions using Prometheus, Thanos, Grafana, and Mimir to ensure the health and performance of Kubernetes clusters and applications.

    • Logging & Tracing: Implement Loki, Promtail, and OpenTelemetry to collect, process, and analyze logs and traces for debugging and forensic analysis.

    • Kubernetes Operations: Deploy, maintain, and optimize Kubernetes clusters, ensuring observability tools are properly integrated and configured.

    • Incident Response & SLOs: Define SLIs, SLOs, and error budgets, develop alerting strategies using Alertmanager, and automate incident response processes.

    • High Availability & Scalability: Optimize observability stack for high availability in limited connectivity environments, leveraging solutions like Thanos for long-term storage and Minio for object storage.

    • Security & Compliance: Implement observability best practices in compliance with security frameworks and Kubernetes security tools such as NeuVector.

    • Automation & Infrastructure as Code (IaC): Automate observability deployments using Terraform, Helm, and Kubernetes Operators.

    • Collaboration & Documentation: Work closely with DevOps, security, and platform teams to enhance system reliability and maintain comprehensive documentation.

  • Qualifications

    Qualifications

    • Active Secret or Top Secret Clearance.

    • Strong Kubernetes expertise in managing and monitoring clusters at scale.

    • Experience with observability stacks including Prometheus, Loki, Thanos, Grafana, OpenTelemetry, and Mimir.

    • Proficiency in logging and tracing frameworks, including Promtail, Fluent Bit, and OpenTelemetry.

    • Hands-on experience with incident management and alerting using Alertmanager, Grafana Alerts, and PagerDuty/Slack integrations.

    • Deep understanding of Kubernetes networking, service meshes (Istio/Linkerd), and security monitoring.

    • Scripting & Automation: Proficiency in Python, Go, or Bash for automating observability tasks.

    • Infrastructure as Code (IaC): Experience with Terraform, Helm, and Kubernetes Operators.

    • Strong troubleshooting and root cause analysis skills in large-scale distributed systems.

    • Experience working in air-gapped or limited connectivity environments is a plus.

    Preferred Skills:

    • Experience with NeuVector, Falco, or other Kubernetes security monitoring tools.
    • Knowledge of eBPF-based observability tools such as Cilium Hubble.
    • Experience optimizing observability stacks for performance and cost efficiency.
    • Familiarity with DevSecOps practices and compliance frameworks.

    Additional Information

    We Value:

    • Drive: Passion and energy to implement quality technical solutions. Self-motivation and intellectual curiosity
    • Commitment to Quality: Passion to conceive and produce world-class solutions that drive real-world value for the customer
    • Customer Focus: Consultative approach to solving problems for customers. Expectations management.
    • Communication: Superior communication skills. Ability to clearly articulate problems, solutions, risks, rewards etc. (written and verbal)
    • Technical Skills: Love for technology. You have to be inherently passionate about technology.
    • Business Acumen: Technology ultimately is used to enable the business. We look for people who understand how the businesses can be enabled through their technical solutions

    What we offer:

    • Ability to make a noticeable difference for the organization and our customers
    • Tremendous growth opportunity by becoming part of a rapidly growing organization. It’s not your tenure but what you can bring to the table that defines how your career will be shaped. You control your growth.
    • Complex but interesting challenges to improve the depth and breadth of your technical and business skills. Our consultants are business technologists and understand how technology drives business.
    • Competitive pay and benefits

    Oteemo is an equal employment and affirmative action employer. We evaluate qualified applicants on merit and business needs and not on race, color, religion, creed, gender, sexual orientation, national origin, ancestry, age, disability, genetic information, marital status, veteran status or any other factor protected by law. Oteemo complies with the law regarding reasonable accommodations for handicapped and disabled employees.

    It has come to our attention that unauthorized individuals may be impersonating our company or recruiters using fake email addresses or social media profiles to contact job seekers. These messages may offer fake job opportunities or request sensitive information under false pretenses.
    Please be advised:

    • _ All official communications from our company will come from email addresses ending in @oteemo.com._
    • We do not ask for personal information such as your Social Security number, bank account details, or payment of any kind during the recruitment process.
    • If you receive an email or message that seems suspicious, do not respond , click on any links, or provide any information.
    • You can verify the legitimacy of any communication by contacting us directly through our official website or LinkedIn page.

    We take these matters seriously and are committed to protecting the integrity of our recruitment process.