- Designed, developed, and released a tool used by the automation labs team to
- Monitor device/network health of 800+ devices across 3 automation labs
- Identify and quantify lab design and infrastructure problems
- Reduce the number of automated test failures due to offline or faulty devices
- Used Python for main logic and data collection mechanisms
- Used Docker to run the following microservice components
- OpenTelemetry
- Receives metrics data to send to the database
- Prometheus - Stores latency data in a time-series database
- Grafana - Hosts dashboards for data visualizations and problem notifications
- Used Jira for work tracking and Confluence for project documentation
- Project was delivered on time, passed code review, and became part of production
- Technologies: Python, OpenTelemetry, Prometheus, Grafana, Docker, GitLab, Jira, Confluence