Sorry, this listing is no longer accepting applications. Don’t worry, we have more awesome opportunities and internships for you.

Site Reliability Engineer

LaBine and Associates

Site Reliability Engineer

San Diego, CA
Full Time
Paid
  • Responsibilities


    Our client develops educational opportunities for many of the most eager students in the world. Since 2003, we have trained tens of thousands of the country’s top students, including nearly all the members of the US International Math Olympiad team, through our online school, learning centers, textbooks, and online learning systems. Over the years, our international online community of advanced problem solvers has grown to over 800,000 members. While our primary focus has been math for most of our history, we have started expanding into new subjects, such as language arts, science, and computer science.
     
    We are seeking an experienced Sr. Site Reliability Engineer with a vision of creating scalable, secure infrastructure and evolving the value of our growing engineering efforts, in order to create reliable and optimized applications that educate and inspire the next generation of builders. This individual will help bootstrap a new SRE team to strategize, design, implement, monitor, and troubleshoot our web infrastructure. They will lead and mentor other SREs and collaborate with the engineering team to manage and improve Linux-based web servers, automate infrastructure, including CI/CD pipelines, configuration management, and application monitoring.

    What you will be doing:
    • Implementing new monitoring and scaling solutions for mission-critical services.
    • Creating and improving CI/CD pipelines and Ansible plays for our online classrooms, multiplayer math game, grading tools, and more.
    • Establishing cloud infrastructure and tools for new microservices, including email delivery, math parsing, LaTeX rendering, homework grading, etc.
    • Work closely with engineering leadership to strategize and advocate for the short and long term needs of our systems, as well as lead the design, implementation, and maintenance of web infrastructure and pipelines.
    • Provide hands-on technical expertise by utilizing strong coding skills and SRE best practices to improve security, reliability, and monitoring of web applications.
    • Automate and document administration and configuration processes for web servers and databases using Ansible.
    • Utilize technical knowledge to both prevent, mitigate, and respond quickly to service failures.
    • Motivate, give technical direction to, and foster the growth of team members.

    Experience you will need:

    • Experience with creating and maintaining Linux-based / LAMP-stack systems.
    • Experience with Apache and/or Nginx.
    • Experience with JavaScript, TypeScript, or PHP.
    • Experience planning, designing, implementing, securing, and monitoring scalable infrastructure for web applications.
    • Familiarity with creating Ansible plays and CI/CD pipelines for web applications.

     
    Schedule:
    Many of the classes will happen during weekend and weekday evening hours. In the rare event of an unexpected site outage or service interruption, this full-time position may require occasional night or weekend work.