Sr Site Reliability Engineer (DevOps)

palo_alto_networks

Santa Clara, CA

Paid

Responsibilities
We are reshaping the cybersecurity market through our cloud-delivered security services, and our cloud infrastructure is quickly and massively growing with a global footprint. We’re looking for great SREs, as well as software engineers interested in production engineering, to help us scale the largest enterprise security cloud infrastructure in the world.

Description:

Palo Alto Networks reinvented the enterprise firewall, growing from a start-up to a multi-billion-dollar company. Our Application Framework, the latest offering in our cloud-delivered security services, ingests security events from hundreds of thousands of firewalls deployed across the globe to provide a massive data analytics platform for deep inspection, anomaly detection, and actionable security automation. Our cloud infrastructure is home to a series of massive and complicated distributed systems and virtualization software platforms which enable big data processing around security services, sandboxing and malware detection, URL categorization and malicious site/domain identification, and security research/response.

RESPONSIBILITIES:
- You will be responsible for designing, building, maintaining, and scaling production services and server farms across multiple data centers for complex and data-intensive cloud services.
- You will design and enhance software architecture to improve scalability, service reliability, capacity, and performance.
- You will write automation code for provisioning and operating infrastructure at massive scale. You are not an operator, you’re an experienced software engineer focused on operations.
- You will work with development teams to make sure the applications fit nicely within the infrastructure and scalability/reliability is designed and implemented from the grounds up. You will work with QA on building pipelines and automation for delivering and deploying applications to production.
- You will participate in the occasional on-call rotation supporting the infrastructure.
- You will roll up the sleeves to troubleshoot incidents, formulate theories and test your hypothesis, and narrow down possibilities to find the root cause.
- You write postmortem reviews and remediation recommendation.
QUALIFICATIONS:
- Hands on experience in building fault-tolerant and scalable systems.
- Strong development/automation skills. Must be very comfortable with reading and writing Python code. Java is a plus.
- 10+ years of Unix/Linux experience, with some experience in managing 100+ nodes.
- Tools-first mindset. You build tools for yourself and others to increase efficiency and to make hard or repetitive tasks easy and quick.
- Experience with AWS. Azure and/or GCP is a plus
- Experience with Configuration Management and CI/CD. Salt and Jenkins preferred.
- Preferred experience: API Gateway, CloudFormation or Terraform, Cloudwatch, EC2, IAM, Lambda, RDS, Route53, S3, SNS, SQS, Step Functions, VPC
- Organized, focused on building, improving, resolving and delivering. Good communicator in and across teams, taking the lead.
Learn more about Palo Alto Networks here and check out our fast facts #LI-MB1