Sorry, this listing is no longer accepting applications. Don’t worry, we have more awesome opportunities and internships for you.

Service Infra

TuSimple

Service Infra

San Diego, CA
Full Time
Paid
  • Responsibilities

    Job Description

    JOB OVERVIEW:

    Software Reliability Engineering (SRE) is a discipline that incorporates aspects of software engineering and applies them to infrastructure and operations problems. The main goals are to create scalable and highly reliable software systems. 

    The Software Reliability Engineer will be a part of a team working on a variety of software engineering tasks to create and maintain scalable solutions and reliable software systems for our autonomous truck platform. You will have an opportunity to impact backend services such as fleet monitoring, machine learning and continuous integration among others.

    WHAT YOU'LL DO:

    • Collaborate with others to define and execute on our SRE vision
    • Continuously improve our process for detecting, responding to, and learning from production incidents
    • Provide visibility into availability and performance metrics
    • Reclaim Engineering time by substituting traditionally laborious human tasks with the automation of infrastructure, configuration, and delivery
    • Optimize tools and processes
    • Understand, deploy and provide technical support for infrastructure systems
    • Engage in system design and development from the perspective of SRE
    • Responsible for identifying and mitigating real and potential system problems and issues
    • Ensure and improve security, stability, and scalability by creating new code and scripting
    • Monitor and maintain enterprise-level data centers
    • Handle and debug complex server and service-related issues 
    • Help service owners  identify and instrument Service Level Objectives and design alerts that follow best practices
    • Build tools and mechanisms that enable engineers to deploy and test their services in production
    • Facilitate blameless postmortems and drive effective action items
    • Help teams with automating tedious tasks and enable them to quickly launch new services.
    • Work with service owners to have a proactive approach to designing tests, observing results and creating fixes for complex failure scenarios.
    • Build observability tools, like metrics, logging, tracing systems and alerting infrastructure

    WHAT YOU'LL BRING:

    • 2-5+ years of experience as an SRE, Production or Systems Engineer
    • A strong foundation of Linux Systems Engineering and Automation. 
    • Fluent with one or more programming languages such as Go, Python or Java
    • Deep understandings of Cloud-based (i.e. AWS, Azure, etc.) services and API
    • 1-2 years of experience with container-based architecture, such as Docker and Kubernetes
    • Streaming and Database technologies such as Postgres, Kafka, Cassandra, ElasticSearch, etc.
    • Able to debug complex problems across the whole stack
    • Proficiency with secure configuration management.
    • M.S. or B.S. in Computer Engineering and/or Computer Science
    • Strong communication skills and the ability to work across technical teams
    • A passion and habit for measuring key data points
    • Deep knowledge of networking OSI mode

    PERKS

    • 100% employer-paid healthcare premiums for you and your family
    • Work visa sponsorship available
    • Relocation assistance available
    • Breakfast, lunch, and dinner served every day
    • Full kitchens on every floor with unlimited snacks, drinks, special treats, fruits, meals, and more
    • Stock options / equity
    • Gym membership reimbursement
    • Monthly team building budget
    • Learning/education budget  
    • Employer-paid life insurance
    • Employer-paid long and short disability

    TuSimple is an Equal Opportunity Employer. This company does not discriminate in employment and personnel practices on the basis of race, sex, age, handicap, religion, national origin, or any other basis prohibited by applicable law. Hiring, transferring and promotion practices are performed without regard to the above-listed items.

    Brown University, California Institute of Technology, Carnegie Mellon University, Columbia University, Cornell University, Dartmouth College, Duke University, Georgia Institute of Technology, Harvard University, Harvey Mudd College, Massachusetts Institute of Technology, North Carolina State University, Northwestern University, Princeton University, Purdue University, Rice University, Rose - Hulman Institute of Technology, Stanford University, Tufts University, University of California — Berkeley, University of California — Los Angeles, University of Illinois--Urbana-Champaign, University of Maryland--College Park, University of Massachusetts--Amherst, University of Michigan--Ann Arbor, University of Notre Dame, University of Pennsylvania, University of Southern California, University of Texas Austin, University of Washington, University of Wisconsin--Madison, Williams College, Worcester Polytechnic Institute (WPI), Yale University, MIT, CMU, Waymo, Uber, Facebook, Uber, Amazon, Cruise, Tesla, Argo AI, Baidu, DIDI, Zoox, Nutonomy, Nuro, Aptiv, Pony.Ai, Kodiak, Toyota, Nissan, GM, Ford, VW, Autonomous Car, Autonomous Driving, Robotics, Artificial Intelligence, Machine Learning, Deep learning, Perception, Prediction, Planning, Control, Anduril Industries, Sift, Nauto, Tempus,  Salesforce,  Automation Anywhere, SenSat, Phrasee, Defined Crowd, Pymetrics,Siemens, Socure, AEye, Rev.com, Suki.ai, Verkada, DataVisor, People.ai, AlphaSense, Icertis, Casetext, Blue River Tech, Nvidia, Bright Machines, Orbital Insight, Brighterion, H2O, Intel, Clarifa, X.ai, Zebra Medical Vision, Iris AI, Freenome, Neurala, Akamai, Zoho, ServiceNow, SalesForce, Oracle, Tableau,Splunk,Cvent, Veeam,Atlassian, DocuSign, Dropbox, Veeva Systems, Proofpoint, Cornerstone, Qualtrics. New Relic, Okta, Intralinks, MuleSoft, Freshworks, Slack, Twilio, Anaplan, Stripe,  Workfront, Smartsheet, Zuora, OutSystems, Coupa, Cylance, Elastic, Zoom, SailPoint, BlackLine, iCIMS, Digitate, Qualys, Kareo, DataStax, DiscoverOrg, Siteimprove, Druva, Centrify, Looker, SimilarWeb, Odoo, Kyriba, Sumo Logic, Sisense, PagerDuty, DigitalOcean, Liquid Web, Zaloni, Databricks, ServiceTitan, Fastly, SnapLogic, Mendix, Couchbase, Egnyte, Seismic, Bill.com, Justworks, Collibra, ActiveCampaign, Schoology, SalesLoft, Cylynt,