Sr. Software Engineer (AI Systems & Infrastructure)

IC Defense

Sr. Software Engineer (AI Systems & Infrastructure)

Columbia, MD
Full Time
Paid
  • Responsibilities

    Description: We are seeking a highly experienced and driven Sr. Software Engineer to join our full stack LLM integration and delivery team. The ideal candidate will have a strong background in building scalable AI-powered applications and the infrastructure that supports them, with a focus on delivering exceptional user experiences. You will play a crucial role in architecting and developing the systems that power our cutting-edge LLM applications, ensuring they perform reliably at enterprise scale while enabling rapid iteration and deployment.

    Responsibilities:

    • Lead the design and development of scalable LLM-powered applications and services.
    • Architect infrastructure solutions that support rapid iteration and deployment of AI features
    • Collaborate directly with product teams to translate user needs into technical solutions.
    • Build and maintain the platforms that enable your team to ship AI features quickly and reliably.
    • Develop and manage automation tools to improve system reliability and development efficiency.
    • Implement and maintain monitoring, alerting, and logging systems.
    • Conduct capacity planning and performance tuning for AI workloads.
    • Lead and participate in incident response and post-mortem analyses.
    • Mentor junior team members and contribute to the overall growth of the engineering team.
    • Continuously identify and implement improvements to our systems and development processes.

    Skills Requirements:

    • SWE with AI experience LLM,RAG and MCP
    • Active and current TS.SCI w FSP
    • 12+ years of experience in software engineering with focus on scalable systems.
    • Strong full-stack development experience with user-facing applications.
    • Strong programming skills in languages such as Python, Go, or Java.
    • Extensive experience with cloud platforms (e.g., AWS, GCP, Azure) and their services.
    • Proficiency in containerization technologies (Docker, Kubernetes).
    • Experience with infrastructure-as-code tools (e.g., Terraform, Ansible, Puppet).
    • Expertise in monitoring and observability tools (e.g., Prometheus, Grafana, ELK stack).
    • Familiarity with CI/CD pipelines and practices.
    • Strong problem-solving skills and ability to troubleshoot complex systems.
    • Excellent communication skills and ability to work in a collaborative environment.
    • Experience building products that prioritize user experience and product-market fit.

    Nice to Haves:

    • Experience working with Large Language Models (LLMs) and related infrastructure.
    • Experience with AI/ML model serving and optimization.
    • Background in product-focused engineering environments.
    • Familiarity with machine learning operations (MLOps) practices.
    • Experience with A/B testing and feature flagging for AI features.
    • Contributions to open-source projects.
    • Experience with distributed systems and microservices architectures.
    • Knowledge of security best practices and compliance requirements.
    • Experience with real-time data processing and streaming platforms (e.g., Apache Kafka, Apache Flink).
    • Familiarity with chaos engineering principles and tools.

    This position is 100% on-site. Applicants for positions requiring security clearance will be automatically rejected for candidates not meeting the Security Clearance requirement.