Kafka Monitoring Using Prometheus

April 30, 2020

Introduction

In this article, we are going to discuss how to set up Kafka monitoring using Prometheus. Kafka is one of the most widely used streaming platforms, and Prometheus is a popular way to monitor Kafka. We will use Prometheus to pull metrics from Kafka and then visualize the important metrics on a Grafana dashboard. We will also look at some of the challenges of running a self hosted Prometheus and Grafana instance versus the Hosted Grafana and Hosted Prometheus offered by MetricFire.

To get started, sign on to the MetricFire free trial. You can use Prometheus and Grafana directly on the MetricFire platform, and try out what you learn in this article. 

Introduction to Kafka

Kafka is the most widely used streaming technology built by Confluent. What makes Kafka good is its very seamless high availability and scalability. At the same time, the out-of-the-box installation of Kafka comes with very basic command line monitoring tools. It is very important to monitor the health of Kafka in production deployments. 

Kafka Architecture

Kafka is a distributed streaming platform. Primarily, the Kafka architecture consists of:

  • Topics - A topic is a logical entity on which records are published. A topic consists of the number of partitions. Each record within a topic is assigned a partition along with the incrementing offset. 
  • Producers - Producers publish data to the topic. Producers can either provide the partition number to which the record is to be published, or a hash key which Kafka will consistently use to distribute the data to multiple topics.
  • Consumers - Consumers read data from the topic. Consumers can be distributed across multiple machines. Each consumer is identified with a consumer group. 

Kafka uses Zookeeper to store its configuration and metadata. To find out more details about Kafka, refer to the official documentation.

Introduction to Prometheus

Prometheus is an open source alerting and monitoring tool developed by SoundCloud in 2012. We are not going to explain the basics of Prometheus in this article in detail. For introductions to Prometheus, please refer to our articles below:

Prometheus Monitoring 101

Monitoring a Python web app with Prometheus

Monitoring Kubernetes tutorial: using Grafana and Prometheus

 

Setting up Kafka monitoring using Prometheus

We will use docker to set up a test environment of Kafka, Zookeeper, Prometheus and Grafana. We will use the docker images available at:

https://hub.docker.com/r/grafana/grafana/

https://hub.docker.com/r/prom/prometheus/

https://hub.docker.com/r/wurstmeister/kafka/


For tutorials on how to set up Prometheus and Grafana with Docker, check out our articles on How to Deploy Prometheus on Kubernetes, and Connecting Prometheus and Grafana where both articles show different methods to set up a test environment with Docker. 

Using JMX exporter to expose JMX metrics

Java Management Extensions (JMX) is a technology which provides the tools for providing monitoring within applications built on JVM. 

Since Kafka is written in Java, it extensively uses JMX technology to expose its internal metrics over JMX platform.

JMX Exporter is a collector that can run as a part of an existing Java application (such as Kafka) and expose its JMX metrics over an HTTP endpoint, which can be consumed by any system such as Prometheus. For more information about Prometheus exporters, here is our article that deep dives into how Prometheus exporters work.


Setting up the Dockerfile, configuring Prometheus.yml, and running the instances

As a first step, we will build a Kafka docker image, which will include the JMX exporter instance running as part of Kafka instance.

The configuration file prom-jmx-agent-config.yml is available here:

<p>CODE:https://gist.github.com/denshirenji/18045f47fe3cdf2f321619b174f482e6.js</p>



<p>CODE:https://gist.github.com/denshirenji/9fbae4ce628884401df76e1aa503c39b.js</p>

Once we have the above file as Dockerfile, we can create our docker-compose.yml which would contain configurations for each of our services: Prometheus, Grafana, Zookeeper and Kafka.

<p>CODE:https://gist.github.com/denshirenji/1dd7093032544cb48ba9eb60022e68c8.js</p>


We will also create a default prometheus.yml file along with the docker-compose.yml. This configuration file contains all the configuration related to Prometheus. The config below is the default configuration which comes with Prometheus.


<p>CODE:https://gist.github.com/denshirenji/4f380f53f89fd66f7e8d04b7950a688d.js</p>

Finally, we can run the “docker-compose up -d” to run our Prometheus, Grafana, Zookeeper and Kafka instances.

Plotting the monitoring visualization on Grafana

Now that we have configured Kafka JMX metrics to pipe into Prometheus, it's time to visualize it in Grafana. Browse to http://localhost:3000, login using admin/admin and add the datasource for Prometheus as shown below. Make sure you use the data source name as “Prometheus” since we will be referring to this data source name when we query in our Grafana dashboards.



One way to create a dashboard in Grafana is to manually configure the panels one by one, or to kickstart our process, we can download the pre-configured dashboard from the Grafana dashboard site and import it into your Grafana.

Click on the Download JSON link and download the json file and import it into our Grafana as shown below:



Make sure to choose the correct data source, which is “Prometheus” in our case, and click on the Import button.

You should immediately see the dashboard reporting the following metrics from the Kafka instance:

  • CPU Usage
  • JVM Memory Used
  • Time spent in GC
  • Message in Per Topic
  • Bytes In Per Topic
  • Bytes Out Per Topic



Once you have some records and messages flowing through those topics, you will be able to see the traffic details per Kafka topic as shown below:



You might want to set up alerts on these dashboards if the values in these dashboards exceed some critical threshold. Check out our article Grafana Dashboards from Basic to Advanced dashboards to build dashboards that better suit your needs.

You can also create other types of visualizations based on the metrics exposed by Prometheus. Have a look at the article Our Favorite Grafana Dashboards to create some of the more advanced dashboards.

Let’s take a look at one dashboard that we created below. This example shows the following information:

  • Total no. of messages
  • Kafka Broker Iptime
  • Topic BytesIn to BytesOut Rate
  • Network Processor Idle Percentage
  • JVM Heap Memory Usage
  • JVM System CPU Load
  • Request Handle Idle Percentage
  • Kafka Network Request Metrics
  • Total Partition Count
  • Total Under Replicated




You can download the pre-configured dashboard above from this github link and import it into your Grafana.


Setting up the Monitoring through MetricFire

The setup which we have done above works for very basic Kafka infrastructure which would contain just a few topics and a moderate amount of traffic. In order to handle production level load, which would be a few hundred topics and upwards of a few Mbps network traffic, you would need to scale out Prometheus to handle the increasing load. 

That’s where Hosted Prometheus comes into the picture. It allows you to configure your Prometheus server to use Hosted Prometheus for long-term storage, as well as automatically providing redundant storage of data. 

Hosted Prometheus through MetricFire gives you many benefits such as scalability with increasing load, long term storage of data and continuous active deployment of new features.

Take a look at our tutorial on how to set up Prometheus and Grafana through MetricFire

Conclusion

We have seen how to set up Kafka monitoring using Prometheus. We have also seen some advanced visualizations to monitor Kafka metrics using Grafana, with example dashboards. 

Sign up here for a free trial of our Hosted Prometheus offering. Also, if you have any questions about our products, or about how MetricFire can help your company, talk to us directly by booking a demo


Related Posts

GET FREE MONITORING FOR 14 DAYS