Sr Site Reliability Engineer (DevOps)

palo_alto_networks

Sr Site Reliability Engineer (DevOps)

Santa Clara, CA
Paid
  • Responsibilities

    We are reshaping the cybersecurity market through our cloud-delivered security services, and our cloud infrastructure is quickly and massively growing with a global footprint. We’re looking for great SREs, as well as software engineers interested in production engineering, to help us scale the largest enterprise security cloud infrastructure in the world.

    Description:

    Palo Alto Networks reinvented the enterprise firewall, growing from a start-up to a multi-billion-dollar company. Our Application Framework, the latest offering in our cloud-delivered security services, ingests security events from hundreds of thousands of firewalls deployed across the globe to provide a massive data analytics platform for deep inspection, anomaly detection, and actionable security automation. Our cloud infrastructure is home to a series of massive and complicated distributed systems and virtualization software platforms which enable big data processing around security services, sandboxing and malware detection, URL categorization and malicious site/domain identification, and security research/response. 

    RESPONSIBILITIES:

    • You will be responsible for designing, building, maintaining, and scaling production services and server farms across multiple data centers for complex and data-intensive cloud services.
    • You will design and enhance software architecture to improve scalability, service reliability, capacity, and performance.
    • You will write automation code for provisioning and operating infrastructure at massive scale. You are not an operator, you’re an experienced software engineer focused on operations.
    • You will work with development teams to make sure the applications fit nicely within the infrastructure and scalability/reliability is designed and implemented from the grounds up. You will work with QA on building pipelines and automation for delivering and deploying applications to production. 
    • You will participate in the occasional on-call rotation supporting the infrastructure.
    • You will roll up the sleeves to troubleshoot incidents, formulate theories and test your hypothesis, and narrow down possibilities to find the root cause.
    • You write postmortem reviews and remediation recommendation.

    QUALIFICATIONS:

    •   Hands on experience in building fault-tolerant and scalable systems.

    • Strong development/automation skills. Must be very comfortable with reading and writing Python code. Java is a plus.

    • 10+ years of Unix/Linux experience, with some experience in managing 100+ nodes.

    • Tools-first mindset. You build tools for yourself and others to increase efficiency and to make hard or repetitive tasks easy and quick.

    • Experience with AWS. Azure and/or GCP is a plus

    • Experience with Configuration Management and CI/CD. Salt and Jenkins preferred.

    • Preferred experience: API Gateway, CloudFormation or Terraform, Cloudwatch, EC2, IAM, Lambda, RDS, Route53, S3, SNS, SQS, Step Functions, VPC

    • Organized, focused on building, improving, resolving and delivering. Good communicator in and across teams, taking the lead.

     

    Learn more about Palo Alto Networks here and check out our fast facts #LI-MB1