Monitoring Load Balancers with Grafana

Monitoring Load Balancers with Grafana

Table of Contents

  1. Introduction
  2. What are load balancers?
    1. Why should we monitor load balancers?
    2. What are the important metrics for load balancers?
      1. Latency
      2. Refused connections
      3. 5xx and 4xx statuses
      4. Target host status
      5. Connection (or request) count
  3. What is Grafana?
  4. What is a Grafana as a Service?
  5. Wrapping up

Introduction

Load balancers play an important role in distributed computing. With load balancers, you can distribute heavy work loads across multiple resources, which allows you to scale horizontally. Since they are placed prior to computing resources, they need to endure heavy traffic and allocate it to the right resources fast. For this to happen, monitoring the health and performance of load balancers is key. In monitoring, visualization helps users to view various metrics quickly.

         

Grafana is a web-based dashboard application and is capable of visualizing diverse insights in intuitive dashboards. In this article, we will learn how Grafana can be used to monitor load balancers. Before we dive into the subject, check out MetricFire. MetricFire offers comprehensive monitoring solutions including Grafana-as-a-Service. 

      

With a hosted service, you can set up Grafana dashboards with minimal effort. If you would like to learn more about it, please book a demo with us, or sign on to the free trial today.

             

What are load balancers?

To design good dashboards, we need to understand the target resources we want to monitor. Load balancers can be not only used in network infrastructure but also in compute-intensive architecture. In spite of the different environments, the main goal is the same - distributing workloads so that all requests can be processed in the fastest possible way utilizing all your resources in a balanced way.

       

Load balancers, in addition to the performance aspect, can contribute to maintaining service reliability. When a server or a resource is down, a load balancer is able to detect the symptom and allocate requests to the rest destinations so that users don’t experience any error. Once the broken server is restored, the balancer can direct new requests to that one again. When load balancers decide to which server it allocates jobs, you can make it use a specific algorithm. Depending on which algorithm you adopt, your monitoring metrics and dashboard designs can be also changed.

       

  • Round robin: you distribute requests sequentially across all your resources.
  • Least connections: you send requests to the one with the least connections while also considering computing capacity.
  • Least time: you allocate requests to the resource considering the fastest response time and fewest active jobs.
  • Hash: you make hash codes and map them to specific resources so that you can point to the servers using the hash codes. You can code allocation considering the request URL, caller’s IP address, locations, etc.
  • Combinations: you can mix two algorithms together. For example, you pick random servers and apply the least connections algorithm. 

            

You can use your creativity to design allocation logic but you will be able to get some idea from the five examples above.

          

Why should we monitor load balancers?

When you monitor load balancers with metrics, you can track how your system is performing. By looking at metrics, you can see the number of clients accessing your services and the average time it takes to process each request. You will be able to know the performance by each server so that, when a server shows abnormal behavior, you can take measures to restore the affected server. Since load balancers play a critical role, monitoring load balancers themselves will help you to ensure the continuity of service for your clients.

           

What are the important metrics for load balancers?

Among many, there can be several particular metrics you may want to focus on. Let’s learn the five important metrics below.

          

Latency

Latency refers to the time it takes for your service to finish a request. The longer the latency is, the worse. However, slowness and fastness can be subjective and dependent on your service. A 10-second latency can be a huge problem in an e-commerce website while it may be acceptable for a compute-intensive system. Regardless of your environment, you may want to reduce latency as much as possible.

              

Refused connections

Your load balancers may reject requests from callers. It can happen when some of your servers go down and cannot process your requests anymore. When this happens, the issue can be surfaced to your customers, which can lead to bad customer experience. Before the count of refused connections pile up, you will want to set up an alarm so that you can take swift actions.

          

5xx and 4xx statuses

You may be familiar with 5xx codes if you are a web developer. When you see one of the 5xx codes, this means that your resource or your load balancer failed. It could be because of errors or exceptions from your code in your servers. In addition to 5xx codes, you will also want to monitor 4xx codes, which means that a request url cannot be found or forbidden. In each code group, there are many specific codes you need to learn and monitor.

         

Target host status

When you monitor load balancers, you need to factor in target host health. You will want to know the status (healthy or unhealthy) of each resource. By counting your resources in each category, you can gain a high-level view. If you want to go into a deeper level, you can consider to measure latency, refused connections, 5xx and 4xx counts per each server. If you can order by, for instance, refused connections on a descending order, you will easily be able to extract target servers you want to take actions against.

          

Connection (or request) count

By knowing connection counts hourly, daily, and monthly, you can detect any unexpected spikes or find patterns so that you can plan for similar events in the future. When you expect request spikes in a particular season, you can plan to increase capacities of servers and load balancers. After that season, when you expect traffic to go down, you can shrink your resources to save cost as well.

           

                                 

What is Grafana?

We learned the importance of load balancers and what metrics we want to focus on. The next task is picking the right tool for monitoring those metrics. That’s where Grafana comes in. Grafana is an open source analytics and interactive visualization web application. You can design charts, graphs, and alerts. Grafana, however, is not purely a visualization software. You can also design a front end as well as back end. By such design, you can create customized dashboards and run queries that meet your needs. Its major features include:

               

  • Dashboard templates: this means that you can codify your dashboard design in templates as if you write code. These templates can be useful when you have multiple environments such as dev, uat, stage, and production. With templates, you can deploy those files to replicate the same dashboards.
  • Annotations: when you use graphs, you sometimes want to leave marks and comments to explain particular patterns. In Grafana, you can not only create a manual annotation but also set up automatic annotation. This means when a certain event happens, a configured annotation will be added to mark it.
  • Custom plugins: when you want to have additional features, you can look out and search plugins. By using plugins, you can add more graph types and control dashboard users in more detail. Grafana has good community support so plugins get regularly updated.
  • Alerting: it can be considered as a simple feature but without alerting, you can miss important events. You can configure alerting criteria and to whom you want to send the alert.
  • SQL support: like other dashboard tools, Grafana can use databases as your data source. You can define queries to create proper dimensions for visualization.

                   

These five are just some of the many features. Check our blog to find out more about Grafana.

         

What is a Grafana as a Service?

When you use an open-source software, you can often face steep learning curves and inconveniences of maintaining the software by yourself. When you do all these by yourself, you may end up spending large portions of your time maintaining the tool and spending less developing for your users. This applies to Grafana too. Due to the nature of open-source software, it can be demanding to first set up and regularly manage it.

                           

MetricFire, to remove the inconveniences, provides Grafana as a Service. Using this service, you don’t have to worry about maintenance and you can focus on what matters the most to your team. The hosted Grafana comes with all the good features of the open-source version and, on top of that, you can benefit from,

            

  • owning your own data
  • saving cost with MetricFire’s flexible pricing plans
  • getting expert engineering support

          

With these additional supports, you can start creating metrics to monitor important aspects of load balancers including latency, connections, 4xx and 5xx status codes, and target resource status.

               

Wrapping up

This article explained about load balancers and important metrics for them. Also, we learned about open-source Grafana and Grafana as a Service.

                     

Using Grafana as a Service, you will be able to pay more attention to monitoring itself than to maintaining the Grafana infrastructure. Grafana-as-a-Service will give you more control over your data and allow you to save time and effort to manage the tool. With our expert support, you can get assured and get help quickly.

                

To enjoy all these, visit MetricFire today and check our Graphite as a Service.

Hungry for more knowledge?

Related posts