Performance Testing & AI SRE

NSW Government Agency partners with DevOps1 to secure and scale critical transaction platform

NSW Government Project

Overview

A NSW Government agency, responsible for a critical application managing up to 3 million transactions annually from multiple providers, partnered with DevOps1 to ensure the robustness and scalability of their platform.

The project focused on implementing a rigorous performance testing framework and a secure Site Reliability Engineering (SRE) platform. By integrating advanced tooling and AI-driven analysis, the agency aimed to proactively identify performance and quality requirements and ensure seamless service delivery.

Challenges

The agency faced several critical challenges in ensuring the reliability of their high-volume transaction system:

  • High Transaction Volume: The application processes up to 3 million transactions per year, requiring absolute stability and performance under load.
  • Complex Integration Ecosystem: Data is ingested from multiple external providers, creating complex integration points that are prone to bottlenecks.
  • NFR Assurance: There was a critical need to identify and mitigate potential performance and quality requirements, such as latency and scalability limits, before the system went live.
  • Analysis Bottlenecks: Traditional performance analysis was time-consuming, relying heavily on specialised SRE resources to manually review vast amounts of data to identify trends.

Solution

DevOps1 designed and implemented a bespoke secure SRE and Performance Testing platform tailored to the agency's specific needs.

Primary activities

  • Comprehensive Performance Strategy: Defined and executed performance test scenarios that accurately simulated peak loads and complex transaction flows.
  • Automated Execution: Implemented automated test execution to ensure performance validation.
  • AI-Driven Analysis: Deployed AI agents to automatically analyse test results, identifying anomalies and trends that might escape human review.

Tooling & integration

  • Grafana K6: Utilised for its developer-friendly, scalable load testing capabilities, allowing for precise simulation of user behaviours.
  • InfluxDB: Implemented as the high-performance time-series database to store and query massive volumes of performance metrics.
  • AWS Bedrock Agent: Harnessed Generative AI to act as an intelligent SRE assistant, analysing performance data to provide detailed insights and trend analysis.
K6 Smart SRE Diagram

Benefits

The implementation of the AI-enhanced performance platform delivered significant operational and strategic benefits:

  • 80% Faster Analysis: By harnessing AI SRE knowledge, the team could provide detailed analysis and understand complex trends up to 80% faster than traditional specialised SRE reviews.
  • Proactive Risk Mitigation: Successfully identified and resolved potential performance and quality requirements issues before go-live, preventing production incidents.
  • Scalability Assurance: Validated the platform's capacity to handle the projected 3 million annual transactions with confidence.
  • Enhanced Engineering Efficiency: Freed up specialised SRE resources from manual data crunching, allowing them to focus on strategic improvements and architecture.
Secure your critical applications with AI-driven insights

Ready to scale?

Talk to our technical team to answer your questions.

Contact us
Contact hero