Infrastructure Engineer - Software Engineer – Infrastructure & Hardware Optimization - Remote
Job Description
Hello,
Infrastructure Engineer - Software Engineer – Infrastructure & Hardware Optimization - Remote
We have below job opening.
If you are interested and your experience match with job description.
Please send your updated resume....Asap
Software Engineer – Infrastructure & Hardware Optimization
Location: SF, CA, Portland, OR, Dallas, TX - Remote but need to be local of respective location
Duration: 6 Months+ Contract
Job Description: We are seeking a skilled low-level systems engineer to join the team. This individual will focus on infrastructure software that detects, configures, and optimizes AI inference pipelines across heterogeneous hardware accelerators (e.g., NVIDIA / AMD GPUs, TPUs, AWS Inferentia, FPGAs). You will work on hardware abstraction layers, containerized runtime environments, benchmarking, telemetry, and driver orchestration logic for multi-cloud agentic inference deployments.
Ideal Experience:
· 4–7 years experience in systems software or infrastructure engineering, preferably with exposure to AI/ML workloads.
· Deep expertise in CUDA, NCCL, ROCm, or other accelerator programming frameworks.
· Familiarity with LLM inference runtimes (TensorRT-LLM, vLLM, ONNXRuntime).
· Experience with Kubernetes scheduling, device plugin development, and runtime patching for heterogeneous compute.
· Strong Python/C++ and Linux systems programming skills.
· Passion for building scalable, portable, and secure AI infrastructure.
Responsibilities:
· Design and implement cross-platform hardware detection systems for GPUs/TPUs/NPUs using CUDA, ROCm, and low-level runtime interfaces.
· Build and maintain plugin-based infrastructure for capability scoring, power efficiency tuning, and memory optimization.
· Develop hardware abstraction layers (HAL) and performance benchmarking tools to optimize AI agents for cloud-native inference.
· Extend container-based MLOps systems (Docker/Kubernetes) with support for hardware-specific runtime containers (e.g., TensorRT, vLLM, ROCm).
· Automate driver validation, container security hardening, and runtime health monitoring across deployments.
· Integrate telemetry systems (Prometheus, Grafana) to surface per-device inference performance metrics and health status.
· Collaborate with solutions and DevOps teams to ensure hardware-aware agent deployment across cloud providers.
Additional Information
All your information will be kept confidential according to EEO guidelines.