how-to-monitor-apache-zookeeper-using-the-opentelemetry-collector

How to Monitor Apache Zookeeper Using the OpenTelemetry Collector

Table of Contents

Introduction 

Apache Zookeeper is a distributed coordination tool that helps keep large-scale systems in sync. It’s the backbone for managing leader elections, service discovery, and metadata storage in projects like Kafka, Hadoop, and Elasticsearch. Think of it as a highly available traffic controller for distributed apps, ensuring everything runs smoothly.

Since so many critical services can depend on Zookeeper, monitoring it is a must. If request latency spikes, connections pile up, or the data tree grows out of control, it can ripple through your entire infrastructure. Keeping an eye on key metrics like latency, active connections, and leader status helps spot issues early, optimize performance, and prevent downtime. In short, if Zookeeper is happy, your distributed apps are happy - and that’s why monitoring matters!

In this guide, we'll detail how to install Zookeeper and OpenTelemetry Contrib to collect and forward performance metrics to a datasource.

Install and Configure Zookeeper

If you're already running an instance of Zookeeper in your production or test environment, you can jump straight to the OTel configuration section. Otherwise, the following section provides a quick setup guide to install and run Zookeeper in a Linux environment.

Install and Run the Zookeeper System Package (Linux)

sudo apt install -y zookeeper zookeeperd

sudo systemctl restart zookeeper
sudo systemctl status zookeeper

Confirm that it's listening on default port :2181 and check status:

netstat -tulnp | grep :2181

echo "stat" | nc localhost 2181

NOTE: The default configuration is for a standalone node - but if you want to configure a cluster you can update your config file, typically located at: /etc/zookeeper/conf/zoo.cfg

Now just add it to your system PATH and grant permissions for it to write log files:

export PATH=$PATH:/usr/share/zookeeper/bin
sudo chmod -R 777 /var/log/zookeeper

Access the Interactive Zookeeper CLI

The Zookeeper interactive CLI (zkCli.sh) is a handy tool for exploring and managing Zookeeper in real time. You can use it to create and modify znodes, check server status, and troubleshoot issues without needing extra scripts or clients.

zkCli.sh -server 127.0.0.1:2181

Expected Output:

Connecting to 127.0.0.1:2181
Welcome to ZooKeeper!
JLine support is enabled
[zk: 127.0.0.1:2181(CONNECTED) 0]

Create a test node named mytest:

create /mytest "HelloZookeeper"

Verify the /mytest node and retrieve it's value:

ls /

get /mytest

Install and Configure OpenTelemetry

OpenTelemetry as a Collector has built-in Receiver plugins for HAProxy, NGINX, PostgreSQL, Redis, MongoDB, Kafka, Elasticsearch, RabbitMQ and many more! These Receivers pull key performance metrics directly from your services with no manual parsing or custom scripts needed. This article assumes you're already using OpenTelemetry in your technology stack, but below are examples of how to install and configure otelcol-contrib to collect system-level metrics and export them to a storage endpoint.

Install OpenTelemetry Collector Contrib (Linux)

Download otelcol-contrib and unpack it (see their installation docs for up-to-date versions and installation commands for different operating systems).
Packages and files are generally installed at: /etc/otelcol-contrib/

Ubuntu/Debian (AMD)

wget https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.119.0/otelcol-contrib_0.119.0_linux_amd64.deb

sudo dpkg -i otelcol-contrib_0.119.0_linux_amd64.deb

RedHat/CentOS (AMD)

wget https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.119.0/otelcol-contrib_0.119.0_linux_amd64.rpm

sudo rpm -ivh otelcol-contrib_0.119.0_linux_amd64.rpm

Confirm Installation

otelcol-contrib --version

Configure the Zookeeper Receiver and Carbon Exporter

The Zookeeper Receiver in OpenTelemetry collects connection performance metrics, processes them, and forwards them to your exporter of choice.

The Carbon Exporter takes those metrics and ships them straight to Graphite-compatible backends, making it easy to integrate OTel into existing monitoring stacks with minimal setup.

If you're not currently hosting your own Graphite data source, start a 14-day free trial with Hosted Graphite by MetricFire to continue following along with this example!

A MetricFire account will provide the Graphite data source and include Hosted Grafana as a visualization tool - along with robust Alerting, Integrations, and Team features.

  1. First, find the configuration file generally located at: /etc/otelcol-contrib/config.yaml and open it in your preferred text editor.
  2. Then you can either replace the file, or add the following sections to your current config.
  3. Be sure to include your HG-API-KEY, and server HOSTNAME.
zookeeper:
endpoint: "localhost:2181"
collection_interval: 20s
initial_delay: 1s

processors:
  batch: {}

  metricstransform:
    transforms:
      - include: ".*"
        match_type: regexp
        action: update
      new_name: "<HG-API-KEY>.opentel.$$0"
        operations:
          - action: add_label
            new_label: host
          new_value: <HOSTNAME>

exporters:
  carbon:
  endpoint: "carbon.hostedgraphite.com:2003"
    timeout: 10s

service:
  pipelines:
    metrics:
      receivers:
        - snowflake
      processors:
        - batch
        - metricstransform
      exporters:
        - carbon

Now, just save the file and restart the otelcol-contrib service:

sudo systemctl restart otelcol-contrib
sudo systemctl status otelcol-contrib

Or you can serve the file manually - to troubleshoot potential configuration errors:

otelcol-contrib --config /etc/otelcol-contrib/config.yaml

Ship Tagged Graphite Metrics

Within two minutes of restarting the otelcol-contrib service, a set of Tagged Graphite metrics will be forwarded to your Hosted Graphite account (with the opentel.zookeeper prefix).

Why does OpenTelemetry export Tagged metrics to Carbon?

  • "Because it's core model aligns more with Prometheus-style 'labels' than Graphite's 'dot notation' formatting. Instead of flattening data into long metric names, it preserves the labels by using Graphite tags, which also enables richer filtering in your Graphite backend."

Now you can locate these Tagged Metrics in the Hosted Graphite Tag Search UI - under the host tag:

How to Monitor Apache Zookeeper Using the OpenTelemetry Collector - 1


Otherwise, you can configure a different exporter to forward metrics to another data source.

Run Some Tests in Zookeeper

The following tests will simulate some real-world Zookeeper activity, to observe how different operations affect your metric values.

Create and Delete Nodes to Measure Data Growth

This creates 100 znodes, and will impact your zookeeper.znode.count and zookeeper.data_tree.size metrics:

for i in {1..100}; do
  zkCli.sh -server 127.0.0.1:2181 create /test$i "LoadTest"
done

Delete nodes to reset the data tree and free up MEM:

for i in {1..100}; do
  zkCli.sh -server 127.0.0.1:2181 delete /test$i
done

Simulate a High Client Load

Trigger multiple client connections to impact your zookeeper.connection.active metric (measures how many concurrent clients Zookeeper can handle before performance degrades):

for i in {1..20}; do
  zkCli.sh -server 127.0.0.1:2181 ls / &
done

Stress Testing File Descriptors

This will impact your zookeeper.file_descriptor.open metric, which is useful for diagnosing resource exhaustion issues:

for i in {1..50}; do
  zkCli.sh -server 127.0.0.1:2181 ls / &
done

Simulate a Heavy Disk Load

Your zookeeper.fsync.exceeded_threshold.count metric will be affected if disk writes take too long:

dd if=/dev/zero of=/tmp/testfile bs=1M count=10000

Use Hosted Graphite by MetricFire to Create Custom Dashboards and Alerts

Hosted Graphite by MetricFire is a monitoring platform that enables you to gather, visualize and analyze metrics and data from servers, databases, networks, processes, devices, and applications. Using MetricFire, you can effortlessly identify problems and optimize resources within your infrastructure.

Metrics sent to the public carbon endpoint will be timestamped, and aggregated into the Hosted Graphite backend.

  1. OpenTelemetry metrics will be sent and stored in the Graphite Tag format of: my.series;tag1=value1;tag2=value2 metric_value (timestamp)

  2. The tags provide a filtering option, making the metrics efficient to query

  3. Metrics are stored in your Hosted Graphite account for two years, and you can use them to create custom Dashboards and Alerts.

Build Custom Dashboards in MetricFire's Hosted Grafana

In the Hosted Graphite UI, navigate to Dashboards and select + New Dashboard to create a new visualization.

Then go into Edit mode and use the Query UI to select a graphite metric path (the default data source will be HostedGraphite if you are accessing Grafana via your HG account).

NOTE: To query Tagged Graphite metrics, you must apply the seriesByTag Graphite function.

How to Monitor Apache Zookeeper Using the OpenTelemetry Collector - 2


Grafana has many additional options to apply different visualizations, modify the display, set units of measurement, and some more advanced features like configuring dashboard variables and event annotations. Here is an example of a production-level dashboard that uses metrics collected by the OTel Zookeeper Receiver:

How to Monitor Apache Zookeeper Using the OpenTelemetry Collector - 3


See the Hosted Graphite Dashboard docs for more details around building custom Grafana visualizations.

Creating Graphite Alerts

In the Hosted Graphite UI, navigate to Alerts => Graphite Alerts to create a new alert. Name the alert and add a Tagged Zookeeper Metric to the alerting metric field:

How to Monitor Apache Zookeeper Using the OpenTelemetry Collector - 4


Then, select the Alert Criteria tab to set a threshold and select a notification channel. The default notification channel will be the email you used to sign up for the Hosted Graphite account. Still, you can easily configure channels for Slack, PagerDuty, Microsoft Teams, OpsGenie, custom webhooks and more. See the Hosted Graphite Notification Channels docs for more details:

How to Monitor Apache Zookeeper Using the OpenTelemetry Collector - 5

Conclusion

Monitoring your Zookeeper environment is crucial because it sits at the core of many distributed systems, managing metadata, leader elections, and service coordination. If Zookeeper slows down, runs out of resources, or fails entirely, it can cause cascading failures across your entire infrastructure. By keeping an eye on key metrics - like latency, connection counts, znodes, and file descriptors to can catch performance issues early, optimize resource usage, and ensure high availability. A well-monitored Zookeeper setup means fewer surprises, better stability, and a smoother-running distributed system. 

Sign up for the free trial and begin monitoring your infrastructure today. You can also book a demo and talk to the MetricFire team directly about your monitoring needs.

You might also like other posts...
metricfire Mar 11, 2025 · 8 min read

How to Monitor Docker With Telegraf and MetricFire

Monitoring your Docker environment is critical for ensuring optimal performance, security, and reliability of... Continue Reading

metricfire Mar 03, 2025 · 10 min read

Monitor OracleDB EX with OpenTelemetry and MetricFire

Tech businesses choose OracleDB for its powerful querying, reliability, and ability to handle massive... Continue Reading

metricfire Feb 26, 2025 · 9 min read

How to Monitor Aerospike With OpenTelemetry and MetricFire

Aerospike is a blazing-fast database built for high-throughput, low-latency workloads. But even the fastest... Continue Reading

header image

We strive for 99.999% uptime

Because our system is your system.

14-day trial 14-day trial
No Credit Card Required No Credit Card Required