Software Engineer (Observability & Monitoring) - West Des Moines, IA

AHU Technologies Inc

Washington, DC

Full Time

Paid

Responsibilities
Job Description:

Overview:

· Seeking an experienced Observability and Monitoring Engineer to build and mature our enterprise-wide monitoring, logging, alerting, and observability capabilities across our AWS-based technology stack.

· This role will define the strategy, architecture, implementation standards, and dashboards that enable proactive detection, faster troubleshooting, and data-driven insights across applications, infrastructure, operating systems, databases, file transfers, and batch processes.

· The ideal candidate has hands-on engineering expertise, strong architecture skills, and the ability to unify multiple monitoring solutions into a cohesive observability framework.

Responsibilities:

· You will establish standards for logs, metrics, traces, event correlation, and alert across multiple environments

· You will build centralized dashboards and alerting policies that provide unified visibility across: applications & services, operating systems, AWS services (EC2, RDS, Lambda, S3, CloudWatch, CloudTrail, etc.), databases (MS SQL Server, PostgreSQL, etc.), file transfer systems (SFTP, managed transfer tools), batch jobs and scheduled processes.

· You will create actionable and noise-free alerting thresholds, escalation policies, and runbooks.

· You will integrate existing tools (Dynatrace, Graylog, Splunk, SolarWinds, Zabbix) into a cohesive ecosystem.

· You will rationalize tool usage and recommend consolidation or modernization where appropriate.

· You will manage the lifecycle, configuration, tuning, and health of monitoring and logging platforms, automate monitoring deployments using IaC (CloudFormation) and CI/CD pipelines, and develop reusable templates/standards so teams can onboard new applications quickly.

· You will build self-service dashboards and reporting for technical/business stakeholders, create documentation for monitoring standards, dashboard naming conventions, logging schemas, and alert configuration guidelines.

· You will define SLOs/SLIs and reliability KPIs for critical services.

· You will partner with scrum teams, infrastructure, and security teams to reduce MTTR and improve system reliability, participate in incident resolution, root cause analysis, and problem management.

· You will provide technical leadership/mentoring to team members and consult on architecture decisions and best practices.

· You will Develop/maintain system documentation and participate in project planning and technical strategy sessions.

Qualifications:

· Bachelor's degree in Computer Science or related field

· 5+ years of experience implementing monitoring and observability using Dynatrace

· Hands-on experience with monitoring/logging tools such as Zabbix, Graylog, Splunk, SolarWinds, or equivalents

· 5+ years of hands-on experience with AWS services and architecture

· Deep understanding of metrics, logs, traces, distributed tracing, and event correlation

· Experience building dashboards and KPIs for application, infrastructure, and database layers

· Strong scripting/automation skills (Python, Bash, PowerShell) and familiarity with Terraform or CloudFormation

· Strong understanding of network monitoring, performance tuning, and systems architecture

· Familiarity with ITIL incident/problem management processes

· Proficiency with AI tools and using them responsibly in improving observability preferred

· Experience with container orchestration and microservices architecture preferred

· Experience with AWS OpenTelemetry, Prometheus, Grafana, or similar tools preferred

Required Technical Skills:

• AWS Services (EC2, RDS, S3, Lambda, ECS/EKS, etc.)

• Configuration Management (Ansible, Puppet, Chef)

• Monitoring Tools (Dynatrace, CloudWatch, Zabbix, Solarwinds, Graylog etc.)

• CI/CD Tools (Jenkins, Quickbuild, Bitbucket)

• Scripting Languages (Python, PowerShell, Bash)

• Database Management (MS SQL Server, PostgreSQL)

• Infrastructure as Code (Terraform, CloudFormation)

• Container Technologies (Docker, Kubernetes)