Software Engineer (Observability & Monitoring) - West Des Moines, IA

AHU Technologies Inc

Software Engineer (Observability & Monitoring) - West Des Moines, IA

Washington, DC
Full Time
Paid
  • Responsibilities

    Job Description:

    Overview:

    · Seeking an experienced Observability and Monitoring Engineer to build and mature our enterprise-wide monitoring, logging, alerting, and observability capabilities across our AWS-based technology stack.

    · This role will define the strategy, architecture, implementation standards, and dashboards that enable proactive detection, faster troubleshooting, and data-driven insights across applications, infrastructure, operating systems, databases, file transfers, and batch processes.

    · The ideal candidate has hands-on engineering expertise, strong architecture skills, and the ability to unify multiple monitoring solutions into a cohesive observability framework.

    Responsibilities:

    · You will establish standards for logs, metrics, traces, event correlation, and alert across multiple environments

    · You will build centralized dashboards and alerting policies that provide unified visibility across: applications & services, operating systems, AWS services (EC2, RDS, Lambda, S3, CloudWatch, CloudTrail, etc.), databases (MS SQL Server, PostgreSQL, etc.), file transfer systems (SFTP, managed transfer tools), batch jobs and scheduled processes.

    · You will create actionable and noise-free alerting thresholds, escalation policies, and runbooks.

    · You will integrate existing tools (Dynatrace, Graylog, Splunk, SolarWinds, Zabbix) into a cohesive ecosystem.

    · You will rationalize tool usage and recommend consolidation or modernization where appropriate.

    · You will manage the lifecycle, configuration, tuning, and health of monitoring and logging platforms, automate monitoring deployments using IaC (CloudFormation) and CI/CD pipelines, and develop reusable templates/standards so teams can onboard new applications quickly.

    · You will build self-service dashboards and reporting for technical/business stakeholders, create documentation for monitoring standards, dashboard naming conventions, logging schemas, and alert configuration guidelines.

    · You will define SLOs/SLIs and reliability KPIs for critical services.

    · You will partner with scrum teams, infrastructure, and security teams to reduce MTTR and improve system reliability, participate in incident resolution, root cause analysis, and problem management.

    · You will provide technical leadership/mentoring to team members and consult on architecture decisions and best practices.

    · You will Develop/maintain system documentation and participate in project planning and technical strategy sessions.

    Qualifications:

    · Bachelor's degree in Computer Science or related field

    · 5+ years of experience implementing monitoring and observability using Dynatrace

    · Hands-on experience with monitoring/logging tools such as Zabbix, Graylog, Splunk, SolarWinds, or equivalents

    · 5+ years of hands-on experience with AWS services and architecture

    · Deep understanding of metrics, logs, traces, distributed tracing, and event correlation

    · Experience building dashboards and KPIs for application, infrastructure, and database layers

    · Strong scripting/automation skills (Python, Bash, PowerShell) and familiarity with Terraform or CloudFormation

    · Strong understanding of network monitoring, performance tuning, and systems architecture

    · Familiarity with ITIL incident/problem management processes

    · Proficiency with AI tools and using them responsibly in improving observability preferred

    · Experience with container orchestration and microservices architecture preferred

    · Experience with AWS OpenTelemetry, Prometheus, Grafana, or similar tools preferred

    Required Technical Skills:

    • AWS Services (EC2, RDS, S3, Lambda, ECS/EKS, etc.)

    • Configuration Management (Ansible, Puppet, Chef)

    • Monitoring Tools (Dynatrace, CloudWatch, Zabbix, Solarwinds, Graylog etc.)

    • CI/CD Tools (Jenkins, Quickbuild, Bitbucket)

    • Scripting Languages (Python, PowerShell, Bash)

    • Database Management (MS SQL Server, PostgreSQL)

    • Infrastructure as Code (Terraform, CloudFormation)

    • Container Technologies (Docker, Kubernetes)