Description: We are seeking a highly experienced and driven Sr. Software Engineer to join our full stack LLM integration and delivery team. The ideal candidate will have a strong background in building scalable AI-powered applications and the infrastructure that supports them, with a focus on delivering exceptional user experiences. You will play a crucial role in architecting and developing the systems that power our cutting-edge LLM applications, ensuring they perform reliably at enterprise scale while enabling rapid iteration and deployment.
Responsibilities:
- Lead the design and development of scalable LLM-powered applications and services.
- Architect infrastructure solutions that support rapid iteration and deployment of AI features
- Collaborate directly with product teams to translate user needs into technical solutions.
- Build and maintain the platforms that enable your team to ship AI features quickly and reliably.
- Develop and manage automation tools to improve system reliability and development efficiency.
- Implement and maintain monitoring, alerting, and logging systems.
- Conduct capacity planning and performance tuning for AI workloads.
- Lead and participate in incident response and post-mortem analyses.
- Mentor junior team members and contribute to the overall growth of the engineering team.
- Continuously identify and implement improvements to our systems and development processes.
Skills Requirements:
- SWE with AI experience LLM,RAG and MCP
- Active and current TS.SCI w FSP
- 12+ years of experience in software engineering with focus on scalable systems.
- Strong full-stack development experience with user-facing applications.
- Strong programming skills in languages such as Python, Go, or Java.
- Extensive experience with cloud platforms (e.g., AWS, GCP, Azure) and their services.
- Proficiency in containerization technologies (Docker, Kubernetes).
- Experience with infrastructure-as-code tools (e.g., Terraform, Ansible, Puppet).
- Expertise in monitoring and observability tools (e.g., Prometheus, Grafana, ELK stack).
- Familiarity with CI/CD pipelines and practices.
- Strong problem-solving skills and ability to troubleshoot complex systems.
- Excellent communication skills and ability to work in a collaborative environment.
- Experience building products that prioritize user experience and product-market fit.
Nice to Haves:
- Experience working with Large Language Models (LLMs) and related infrastructure.
- Experience with AI/ML model serving and optimization.
- Background in product-focused engineering environments.
- Familiarity with machine learning operations (MLOps) practices.
- Experience with A/B testing and feature flagging for AI features.
- Contributions to open-source projects.
- Experience with distributed systems and microservices architectures.
- Knowledge of security best practices and compliance requirements.
- Experience with real-time data processing and streaming platforms (e.g., Apache Kafka, Apache Flink).
- Familiarity with chaos engineering principles and tools.
This position is 100% on-site. Applicants for positions requiring security clearance will be automatically rejected for candidates not meeting the Security Clearance requirement.