Sorry, this listing is no longer accepting applications. Don’t worry, we have more awesome opportunities and internships for you.

Lead Platform Engineer

Fort Point

Lead Platform Engineer

Baltimore, MD +1 location
Full Time
Paid
  • Responsibilities

    SENIOR PLATFORM ENGINEER

    We are seeking a hands­ on Senior Platform Engineer to own the end—to-end lifecycle of an enterprise grade Elastic search platform.

    • This role combines deep technical expertise in Elastic search with platform engineering rigor to design, deploy, operate, and optimize large­ scale search, observability, and log analytics solutions.
    • You will serve as the subject matter expert (SME) for Elastic search ensuring high availability, performance, security, and compliance across cloud and on­ prem deployments.
    • Collaborating with architecture, development, security, and operations teams, you will drive scalability, automate workflows, and enable data driven applications while maintaining operational excellence in an Agile delivery model.

    Key Responsibilities Platform Design & Development

    • Architect and deploy scalable Elastic search solutions for search, observability, and logs/metrics analytics’ use cases.
    • Design and implement ELK/Elastic Stack (Elasticsearch, Logstash, Kibana) and complementary pipelines.
    • Create and manage multi­-node clusters across availability zones in cloud and/or on ­prem environments.
    • Translate business requirements into technical designs, including indexing strategies, shard allocation, and data modeling Operations & Support
    • Monitor, maintain, and troubleshoot ELK/Elastic environments & Elasticsearch clusters for performance, stability, and data integrity.
    • Perform performance tuning namely query optimization, indexing pipelines and shard rebalancing.
    • Conduct capacity planning, configuration management, and continuous improvement via metrics, alerts, and automation.
    • Execute version upgrades, patching, and backward­ compatible migrations with minimal downtime. Automation & DevOps Integration
    • Build and maintain Infrastructure as Code (Terraform/Ansible), CI/CD pipelines, and automation scripts (Python, Bash, etc...). Integrate Elastic search with cloud providers, cloud ­native services, and other open source observability tools.
    • Enable self service capabilities for development teams via APIs, templates, and dashboards Collaboration & Enablement
    • Partner with DevOps, Security, and AppDev teams to integrate Elastic search into micro services, CI/CD, and monitoring workflows.
    • Provide expert guidance, code/config reviews, and lightweight scripting to accelerate feature delivery.
    • Create and maintain run books, architecture diagrams, KPIs dashboards (Kibana), and troubleshooting guides.
    • Innovation & Continuous Improvement
    • Evaluate and adopt new Elastic features, plugins, and ecosystem tools.
    • Lead proof­ of­ concepts for advanced use cases (machine learning, cross­ cluster replication etc…). Identify automation opportunities and drive platform resilience initiatives.

    Mandatory Skills

    • Hands­ on Elastic search Administration (3-5+ years) Experience operating Elastic search clusters in enterprise environments, including configuration, maintenance, upgrades, patching, and monitoring.
    • Deep Expertise in Elastic search Architecture Strong understanding of nodes, shards, replicas, indexing strategies, query optimization, aggregations, APIs, and cluster scaling.
    • Proven Experience Deploying and Managing the Elastic Stack (ELK)
    • Ability to design, deploy, and administer Logstash, Kibana, APM, Beats, Fleet Server, ILM policies, and ingestion pipelines.
    • Scripting & amp; Automation Proficiency
    • Skilled with Python, Bash, Terraform, Ansible, and similar tools to automate cluster operations, monitoring, and CI/CD workflows.
    • Understanding of Distributed Systems, Networking & amp; Storage
    • Knowledge of core distributed systems principles, load balancers (including Nginx), networking, and storage backends Incident Management, Troubleshooting, and Root Cause Analysis
    • Ability to diagnose ingestion issues, log disruptions, cluster health problems, and performance bottlenecks across the Elastic ecosystem
  • Locations
    Baltimore, MD • Irving, TX