Sorry, this listing is no longer accepting applications. Don’t worry, we have more awesome opportunities and internships for you.

Manager, Enterprise Monitoring and Observability

Pilot Company

Manager, Enterprise Monitoring and Observability

Roswell, GA +1 location
Full Time
Paid
  • Responsibilities

    Job Description

    The purpose of this position is to lead the Monitoring and Observability practice across the Pilot Company enterprise. The role will establish monitoring and observability, proactive solutions, alerting, automation, and site reliability for business -critical systems and platforms.

    1. Oversee, lead, and set priorities for the Monitoring and Observability team specifically focused on monitoring and observability, proactive solutions, alerting, automation, and site reliability
    2. Coach, train and develop direct reports (includes appraising job performance and conducting performance reviews)
    3. Lead team of site reliability engineers (SRE) to develop enterprise logging, metrics, and traces for business -critical systems as well as dashboards (visibility) for different levels of support
    4. Work with infrastructure, product, and support teams to define tools and strategy to ensure full observability, alerting, and proactive monitoring of business -critical systems
    5. Integrate full observability and proactive monitoring practice with ITSM Office to ensure tracking and timely communication of incidents, outages, and issues
    6. Collaborate with Business and IT stakeholders to define thresholds, SLAs, and runbooks and help proactively identify issues and drive down reoccurring incidents
    7. Lead oversight of third party vendors’ work to ensure vendors fulfill contractual commitments and statements of work
    8. Assist with monitoring events (e.g., warnings and exceptions) and identify routine activities and resolutions that can be automated to improve system efficiencies
    9. Serve as a subject matter expert and maintain knowledge of current industry trends and developing technologies
    10. Model behaviors that support the company’s common purpose; ensure guests and team members are supported at the highest level
    11. Ensure all activities are in compliance with rules, regulations, policies, and procedures
    12. Complete other duties as assigned
  • Qualifications

    Qualifications

    • Bachelor’s degree or associate degree required; field of study in technology preferred
    • Minimum seven years’ experience in technology or related field required
    • Minimum one year’s experience managing people preferred
    • Intermediate knowledge of ITSM/ ITIL
    • Intermediate knowledge of Splunk/ITSI, AWS CloudWatch, APM (AppDynamics), SolarWinds, Grafana, Prometheus, or similar.
    • Working knowledge of service -oriented architecture (SOA), microservices, and/or API network design paradigm
    • Working knowledge of network protocols/technology, databases, and application servers and their roles in service delivery
    • Experience using cloud native technologies (Kubernetes, OpenTelemetry, GitHub) in a production environment
    • Ability to lead and motiviate team
    • Excellent analytical and problem solving skills for diagnosis, evaluation and resolution of complex problem situations
    • Ability to work and make decisions independently to determine the appropriate course of action for issues and incidents
    • Ability to work in a fast-paced environment and manage multiple responsiblities simultaneously
    • Ability to provide excellent customer service
    • Ability to collaborate and build consensus within a team, fostering a positive atmosphere
    • Well organized with attention to detail
    • Travel required less than 5%
    • Role will require work outside or normal business hours, including nights and weekends to provide support as needed
    • General office work requiring sitting or standing for long periods of time, including on airplanes and in cars
  • Locations
    Roswell, GA • Knoxville, TN