Observability Health Check

Enhancing system reliability and performance for a leading financial institution.

Global Insurance Organisation

Overview

Our client is a leading financial institution looking to uplift their applications stability. They needed a clear picture of their current observability maturity to ensure system reliability and performance. DevOps1 delivered a comprehensive health check assessment to transformed their monitoring landscape and set them up for future success.

With over 130 production hosts and 2,000 monthly alerts, and discrepancies of monitoring in their non-production environments meant the team needed a clear path forward to uplift their observability capabilities.

Challenges

The project aimed to address critical observability gaps in the client's infrastructure. Many production systems lacked proper monitoring, creating significant blind spots in operational awareness. This situation was compounded by alert fatigue from thousands of monthly notifications without proper prioritization, overwhelming teams and diluting responses to critical issues.

Tool fragmentation presented another major challenge, as multiple disconnected monitoring solutions created information silos and inconsistent alerting. The absence of a standardized observability framework across the organization meant implementation was ad-hoc and inconsistent. As a result, incident detection and resolution remained largely reactive and time-consuming, hampering the team's ability to maintain optimal system performance.

Observability Maturity

Solution

Based on the health check assessment, our client observability maturity was specifically identified as Stage 2 - Foundational, with clear indicators of emerging Stage 3 capabilities.

Stage 2 - Foundational Key characteristics identified:

  • Active Dynatrace deployment in both Production and Non-Production environments
  • Basic infrastructure and APM coverage established
  • Initial observability features activated
  • Still transitioning from reactive to proactive observability practices

Stage 3 - Emerging sings of Proficiency:

  • Tool sprawl awareness demonstrated
  • Initial automation efforts present (e.g., carbon tracking workflows, FinOps host disabling)
  • Teams engaging with the platform, though still focused on technical rather than business-driven observability

To move forward, DevOps1 delivered and strategy to focus on five main areas. First, improve how tools are used by creating consistent naming rules, setting up automatic tagging, and defining clear management zones. At the same time, develop a smart alerting system that helps quickly identify and respond to issues, and make sure these alerts work well with the team's existing IT support processes.

Also building a strong culture of tracking and understanding system performance. This meant finding team members who can lead the way, training experts, and creating a central approach that helps different teams work together. Improve digital monitoring by expanding coverage, fixing performance problems, and connecting technical data to real user experiences. Finally, making sure these technical efforts match business goals by setting clear performance targets, creating meaningful metrics, and building dashboards that show how technical work impacts the overall business success.

Benefits

The health check assessment provided the client with a clear understanding of their observability maturity and a roadmap for improvement.

  • Maturity Baseline: Clear pathway to Stage 3 (Proficient)
  • Defined progression milestones and Performance Insights
  • Application bottleneck identification
  • Critical service prioritization
  • Cost optimisation opportunities
  • Standardised monitoring approach and Governance framework
  • Ongoing assessment dashboards

Our structured approach ensures any organisation understands their observability maturity and have a clear path for improvement, all within a 5-week period.

Integrate with the technologies you depend on today

Ready to get started?

Talk to our technical team to answer your questions.

Contact us