PALO ALTO NETWORKS® is the fastest-growing security company in history.
We offer the chance to be part of an important mission: ending breaches
and protecting our way of digital life. If you are a motivated,
intelligent, creative, and hardworking individual, then this job is for
you!
THE ROLE:
We’re looking for Site Reliability Engineers (SREs) with creative and
innovative problem-solving skills. As a member of Infra SRE, you will
work with other SRE and help us design, build and maintain
mission-critical infrastructure and tools as a platform. You will own
development efforts in each sprint from planning to delivery and will
partner with other engineering teams to provide technical vision in
making their services more observable, scalable and reliable. In this
role as a Cloud Platform SRE you'll take ownership for reliability,
scalability, automation, uptime and availability of our Cloud App and
microservices platform. You will have the opportunity to gain technical
breadth while sharing your cloud platform expertise with other team
members.
You will not only identify problems but also develop and implement
automation solutions in AWS that operate at scale. The best person for
this role is someone that has a collaborative spirit and can seamlessly
collaborate and pair with other engineering teams to build and manage a
reliable, secure, and scalable platform for microservices.
THE RESPONSIBILITIES:
- Design, build and maintain Infra in AWS to enable reliable and rapid
deployment of microservices with effective monitoring and
resilient operations.
- Set up critical infrastructure, develop tools and framework to
automate operational tasks, deployment of machines, services/app
- Work closely with engineering teams to ensure microservices are
designed with scale, operability, and performance
- Create meaningful dashboards, logging, alerting, and responses to
ensure that issues are captured and addressed proactively
- Define Service Level Objectives for product(s) to constantly measure
their reliability in production. Maximize services uptime and
availability ensuring functional and performance SLAs
- Develop custom code or scripts to automate infrastructure,
monitoring services
- Cross Functionality with Engineering Teams: Contribute to
architecture diagrams and other documentation for security reviews
- Initiate, lead scripting and automation to streamline system updates
and upgrades
QUALIFICATIONS
- BS or MS Degree in Computer Science or Engineering involving 7-10
years coding experience in DevOps or SRE role.
- Deep understanding of at least one of modern programming language:
Java, C, C++, Python, C#.
- Fluency in Linux, AWS services, and systems management tools
(Ansible, Puppet, Chef, etc.)
- Fundamental understanding of distributed systems including: the CAP
Theorem, Microservices, and the Twelve Factor App.
PREFERRED QUALIFICATIONS:
- Demonstrated ability to write programs using a high-level
programming language like: C, Java, Python, Ruby
- Hands-on operational experience in creating and managing
microservices
- Excellent communication skills and the ability to work well in a
team
- Strong automation skills to automate routine tasks using Python or
BASH scripting
- Systematic problem-solving approach, strong customer focus,
ownership, urgency, and drive to complete a task
- Demonstrated capability to provide depth and breadth technical
leadership to agile teams
SKILLS AND EXPERIENCE
- Expertise in configuration management with a framework such as
Ansible, Chef, or Puppet
- 5+ years Experience in Site Reliability, or infrastructure
engineering for a commercial SaaS solutions
- 5+ years Expertise in AWS cloud infrastructure and its related
services
- Serious troubleshooting skills across different levels of stack
- Deep experience in monitoring distributed application architecture
- Experience monitoring cloud services with Datadog
- Strong experience with Linux and MySQL
- Proficiency with a programming language like Python, Ruby, Java and
shell scripting to automate tasks
- Experience in CI/CD automation and GitHub
- Experience in custom code or scripts for 'destructive testing' to
ensure adequate resiliency in production
- Excellent problem solving, critical thinking, communication, and
teamwork skills
- Excellent written and verbal communication, able to collaborate and
rally support
- BS or MS in Computer Science, related field, or equivalent
professional experience
Learn more about Palo Alto Networks HERE and check out our FAST FACTS
#LI-MT1