Sorry, this listing is no longer accepting applications. Don’t worry, we have more awesome opportunities and internships for you.

Service Reliability Operator / Linux Administrator

Software Technology Inc

Service Reliability Operator / Linux Administrator

Redwood City, CA
Full Time
Paid
  • Responsibilities

    Job Description

    POSITION : SERVICE RELIABILITY OPERATOR  / LINUX ADMINISTRATOR

    LOCATION : REDWOOD CITY, CA

    DURATION : 1 YEAR +

     

    WE NEED CANDIDATES WHO KNOW INCIDENT MANAGEMENT, LINUX OS, ANY ONE OF MONITORING TOOL, ANY ONE SCRIPTING EXP.

     

    INTERVIEW PROCESS :

     

    1.      TELEPHONIC BY ITC

    2.      TELEPHONIC INTERVIEW WITH CUSTOMER

     

    Here is the breakdown of the tech stack and requirements:

    _ _

    ·        Operating Systems: Linux

    ·        Ticketing Systems: Jira or ServiceNow

    ·        Scripting: Python – is a must have ; Shell, Powershell, Perl, PHP is a plus

    ·        Cloud Infra: Oracle/ Azure/AWS/ OpenStack/ GCP – any one of these

    ·        Knowledge Management: Confluence

    ·        Incident management tools:  Incident Commander, Scribe, Troubleshooter, etc.

    ·        Apps Monitoring: Kibana or Grafana

    ·        System Monitoring: njinx or Zabbix

    ·        Databases: MySql is mandatory and Oracle is a nice to have

    ·        Networking: Firewall management and TCP/IP and other n/w protocols

    ·        Applications exp: Apache, Memcached, Squid, MySQL, NFS, DHCP, NTP, SSH, DNS, and SNMP

    _ _

    Key Job duties:

    ·        Perform proactive service checks and monitor/triage incoming system/application alerts, E-mails and phone calls to ensure appropriate priority and response.

    ·        Triage and troubleshoot service impacting events from multiple signals including phone, E-mail, service telemetry and alerting.

    ·        Identify and work with engineering to implement opportunities for automation, signal noise reduction, recurring issues and other actions to reduce time to mitigate service impacting events and increase the productivity of cloud operations

    ·        Manage the coordination, documentation and tracking of critical incidents ensuring rapid and complete issue resolution and appropriate closed loop to customers and other key stakeholders.

    ·        Work upstream with Service Operations and Development to develop and maintain standard operator procedures and troubleshooting guides that improve time to mitigate.

    ·        Participate in project delivery aimed at increasing capabilities around monitoring, notification, configuration and deployment of servers and applications within the Oracle Cloud Platform.

    ·        Assist in the training and development of more junior team members.

    ·        Work as part of a shift in a 24x7x365 operations team.  – For now this is regular hours

    ·        Ability to work non-standard work shift (though primary shift is daytime) including evenings, holidays and weekends as needed

     

  • Qualifications

    Qualifications null Additional Information

    REGARDS

    MOHAMMED RAYEES

    609-447-3342