We are reshaping the cybersecurity market through our cloud-delivered
security services, and our cloud infrastructure is quickly and massively
growing with a global footprint. We’re looking for great SREs, as well
as software engineers interested in production engineering, to help us
scale the largest enterprise security cloud infrastructure in the world.
Description
Palo Alto Networks reinvented the enterprise firewall, growing from a
start-up to a multi-billion-dollar company. Our Application Framework,
the latest offering in our cloud-delivered security services, ingests
security events from hundreds of thousands of firewalls deployed across
the globe to provide a massive data analytics platform for deep
inspection, anomaly detection, and actionable security automation. Our
cloud infrastructure is home to a series of massive and complicated
distributed systems and virtualization software platforms which enable
big data processing around security services, sandboxing and malware
detection, URL categorization and malicious site/domain identification,
and security research/response.
RESPONSIBILITIES:
- You will be responsible for maintaining and scaling production Kafka
clusters with very high ingestion rates, Zookeeper clusters, as well
as other big data pipeline systems such as Kafka and HDFS.
- You will improve scalability, service reliability, capacity,
and performance.
- You will write automation code for managing, monitoring, measuring,
expanding, and healing clusters.
- You are not an operator, you’re an experienced software engineer
focused on operations.
- You will do Kafka tuning, capacity planning, and deep
dive troubleshooting.
- You will participate in the occasional on-call rotation supporting
the infrastructure.
- You will roll up the sleeves to troubleshoot incidents, formulate
theories and test your hypothesis, and narrow down possibilities to
find the root cause.
QUALIFICATIONS:
- Hands on experience with managing production Kafka clusters.
- Strong development/automation skills. Must be very comfortable with
reading and writing Python. Commits to Kafka source code would be a
big plus.
- In-depth understanding of the internals of Kafka cluster management,
Zookeeper, partitioning, topic replication and mirroring.
- Very good grasp of monitoring and metrics collection, performance
tuning, and troubleshooting complicated situations with
distributed systems.
- Tools-first mindset. You build tools for yourself and others to
increase efficiency and to make hard or repetitive tasks easy
and quick.
- Organized, focused on building, improving, resolving and delivering.
Good communicator in and across teams, great teamwork, and a
character of taking ownership.
Learn more about Palo Alto Networks here and check out our fast facts
#LI-MB1