MoneySuperMarket.com is the UK’s leading price comparison website. They provide free, online tools to help people manage, save and grow their money, by enabling them to compare and switch Insurance, Money and Home Services products from more than 900 providers. MoneySuperMarket.com are part of the MoneySuperMarket Group PLC, an established member of the FTSE 250 index. In 2017, they helped nearly eight million families save an estimated £2.0bn on their household bills, including over two million people who got a better deal on their finances, over half a million households that switched their energy supplier, and five million people who saved on their insurance through their services. With over 550 employees, the tech and digital teams have worked to build a highly scalable, microservices-based platform that supports the latest technologies, languages and testing environments and use Hosted Graphite for all their monitoring.
The technical problem
We wanted to store all our operational time-series data in one, unified place.”
The eighteen member DevOps group Jim heads up provides change, infrastructure and automation specialism directly inside the product delivery teams. “If it moves, we monitor it,” says Jim, “we record system and application metrics from every instance that are gathered through a range of custom checks. We also record business and user experience metrics alongside these so we can help draw out optimisation opportunities. Recently, we started recording and storing incident and change summary figures and even delivery team cycle times to help measure our overall efficacy.” The team needed a reliable cloud monitoring solution to monitor these seemingly disparate metrics, and one that would scale easily as the platform grew. As Jim puts it: “we needed a solid, scalable but affordable metric storage backend.” Other challenges included:
- An indefinite metric volume requirement: the team didn’t know how many metrics they’d need to monitor so they wanted a solution that could scale easily — and affordably — as they migrated more and more services to it.
- Flexible schema: the flexibility that standard Graphite provides but in a hosted service. Control of their own data was a key consideration so the ability to easily extract all their data from Hosted Graphite was a big plus.
- Access control and team sharing: the ability to share operational data dashboards quickly and easily with colleagues in other departments across the company.
- Flexible and predictable payment: to help with budgeting forecasts.
Before moving to Hosted Graphite, the team had been self-hosting an Icinga/Graphite installation, which was resource-heavy and difficult to scale. Migrating from this to Hosted Graphite was reassuringly straightforward: “we received a lot of guidance from Hosted Graphite,” says Jim, “having previously used Graphite metrics, we could use existing glue code and plugins to send data.”
Once set up, the team were surprised by the technical, quick and comprehensive support: “the times when we have had self-created platform issues and Graphite query questions, the Hosted Graphite team have been quick to reply and with a suitable level of understanding.” Hosted Graphite’s flexibility was an unexpected bonus for the team according to Jim, “when we have accidentally sent too many metrics, we only got a polite email from Hosted Graphite asking if they could help but with no loss of service which is pretty impressive. We’ve also been able to ask for new features and the flexible payment plans also helped our budgeting forecasts and still does.”
Now, the team use Hosted Graphite for all their monitoring. According to Jim, “advanced Graphite functions make it possible to view and compare metrics over the full stack. We have designed our name spacing so we can aggregate series using wildcards. We then compare metrics at all levels (instance, server role, environment, architecture area or by company).” The advanced annotations and events feature has proven to be particularly useful: “our delivery team uses Hosted Graphite annotations to record releases that include version and environment which is invaluable.”
For Jim, the ability to scale is another major advantage: “As Hosted Graphite scales effortlessly, we can push and store more metrics than we really need today but might need tomorrow. This increases our depth of understanding of the systems that we run and heads off any future problems.” This understanding goes further than the Operations Engineers and DevOps team, as Jim puts it: “everyone from the business through the developers and the QAs can see exactly what they want, when they need it.” As Hosted Graphite dashboards can be viewed via whitelisted IPs, sharing dashboards with colleagues across the company is straightforward, quick and easy.
As MoneySuperMarket.com grows, Hosted Graphite scales with us and absorbs any spikes.”
Hosted Graphite plays an important role in helping MoneySuperMarket.com reach their targets: “We’ve adopted a 100% infrastructure-as-code approach to help us to reach our cost and disaster recovery targets and enable true blue-green deployment methods across the estate,” says Jim. “Our monitoring systems are core to allowing this to happen. For example, when the 800+ instances in non-production environments start up every week day, each instance runs a number of healthchecks before putting themselves into service.” Many of these checks rely on the persisted, historical metric data that’s stored in Hosted Graphite: “alongside other platform designs, using Hosted Graphite as the common interface for metric storage and analysis has also enabled us to deploy a wide array of open-source and commercial software quickly and without fuss.”
Moving to Hosted Graphite has also freed up valuable engineer time. Self-hosting your own large Graphite system can seem the simplest and safest option until you realise the opportunity cost of doing so. However, keeping that system upgraded, secure and scalable while maintaining a large enough team of talent behind it is not. Jim puts forward the question: “what should these engineers really be working on to maximise business impact? It is unlikely that managing an unwieldy metric storage and display system would be on this list.” The DevOps team put their engineers’ time to better use elsewhere: “building and managing an on-premise installation at this scale would require a lot of engineer time, especially in the first year,” Jim says, “we use this engineering time to work on initiatives closer to our core business, particularly to continually harden our security and compliance posture and improve our system recovery targets.”
MoneySuperMarket.com Limited is an appointed representative of MoneySuperMarket.com Financial Group Limited, which is authorised and regulated by the Financial Conduct Authority (FCA FRN 303190) for the insurance, mortgage and consumer credit products it offers.