buliding-an-open-source-lods-to-graphite-pipeline

From Logs to Metrics Part 1: Building an Open-Source Logs-to-Graphite Pipeline

Table of Contents

Introduction

Monitoring doesn't always need to be complex. In this guide, we'll show you how to turn raw logs into usable metrics using a lightweight open-source setup with no ELK stack and no heavy lifting. We'll use Loki, Python, and Telegraf to convert logs into Graphite metrics you can easily monitor or alert on. This is perfect for system admins, DevOps beginners, or anyone curious about building more innovative monitoring pipelines from scratch. If you don't already have a Hosed Graphite account with MetricFire, sign up for a free 14 day trial HERE.

  • Loki: A log database from Grafana Labs that's super lightweight compared to Elasticsearch.

  • Python: We'll write a small script to parse logs into metrics.

  • Telegraf: A metrics agent that will run our script and forward metrics to a Hosted Graphite account.

Follow along with the below (Linux) examples to create Graphite metrics from your system logs. We're aware this setup has several moving parts, so stay tuned for Pt2 of this series, where we detail how to accomplish this with a more minimal setup - using grok.

Install and Configure Loki

Loki is a log-structured database built by Grafana Labs. Think of it like Prometheus for logs, as it indexes labels instead of raw log content, making it fast and efficient. In this setup, we’ll run Loki locally, store logs on disk, and query them over HTTP using a simple Python script. No Promtail, no Elasticsearch, and no cloud buckets needed!

sudo wget https://github.com/grafana/loki/releases/download/v2.9.4/loki-linux-amd64.zip -O /usr/local/bin/loki.zip

cd /usr/local/bin
sudo unzip loki.zip
sudo mv loki-linux-amd64 loki
sudo chmod +x loki

Create a Loki config file at /etc/loki-config.yaml with basic settings:

auth_enabled: false

server:
  http_listen_port: 3100

ingester:
  lifecycler:
    ring:
      kvstore:
        store: inmemory
      replication_factor: 1

schema_config:
  configs:
    - from: 2024-01-01
      store: boltdb-shipper
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

storage_config:
  boltdb_shipper:
    active_index_directory: /tmp/loki/index
    cache_location: /tmp/loki/cache
    cache_ttl: 24h
  filesystem:
    directory: /tmp/loki/chunks

limits_config:
  max_entries_limit_per_query: 5000

table_manager:
  retention_deletes_enabled: true
  retention_period: 24h

Run Loki manually, to listen on localhost:3100

sudo /usr/local/bin/loki -config.file=/etc/loki-config.yaml

Create a Simple Python Parser

This Python script reads the last 500 lines of /var/log/syslog and counts how many times common system events occur, like successful or failed SSH logins, sudo command usage, and cron job executions. It outputs these counts as Graphite-formatted metrics, which Telegraf can forward to your Hosted Graphite account. This gives you a lightweight way to track key system activity (like login attempts or job schedules) without needing a full logging stack. Just create a new Python file at: /etc/telegraf/parse_loki_metrics.py 

#!/usr/bin/env python3

import time
import re
from collections import deque

LOG_PATH = "/var/log/syslog"

# Patterns and counters for most common log events
patterns = {
    "logs.sshd.success": r"sshd.*Accepted password",
    "logs.sshd.failure": r"sshd.*Failed password",
    "logs.sudo.command": r"sudo: .*COMMAND=",
    "logs.cron.job": r"CRON\[.*\]:"
}

metrics = {key: 0 for key in patterns}

try:
    with open(LOG_PATH, "r") as f:
        recent_lines = deque(f, maxlen=500)
        for line in recent_lines:
            for metric, pattern in patterns.items():
                if re.search(pattern, line):
                    metrics[metric] += 1

    ts = int(time.time())
    for key, val in metrics.items():
        print(f"{key} {val} {ts}")

except Exception as e:
    print(f"logs.script_error 1 {int(time.time())}  # error: {e}")

Make the script executable:

sudo chmod +x /etc/telegraf/parse_loki_metrics.py

Configure Telegraf to Run the Script

If you don't already have an instance of Telegraf running in your server, install our HG-CLI tool to quickly and easily get Telegraf up and running:

curl -s "https://www.hostedgraphite.com/scripts/hg-cli/installer/" | sudo sh

Now, just open your Telegraf configuration file at: /etc/telegraf/telegraf.conf and add the following section:

[[inputs.exec]]
  commands = ["/etc/telegraf/parse_loki_metrics.py"]
  timeout = "5s"
  data_format = "graphite"
name_prefix = "syslog-metrics."

If your syslog can only be accessed with sudo permissions, you may need to update the Telegraf 'command' line to something like this:

  commands = ["/bin/bash -c 'sudo /usr/bin/python3 /etc/telegraf/parse_loki_metrics.py'"]

Once you restart the Telegraf service, the Exec Input Plugin will execute your Python script, read the output, and forward the data to your Hosted Graphite account, it's that easy!

telegraf --config /etc/telegraf/telegraf.conf

Visualize Your Metrics

Once Loki and Telegraf are both running in your server, metrics will be forwarded to your Hosted Graphite account and can be located in the Metrics Search UI (with the telegraf.syslog-metrics.* prefix).

See our Dashboard docs on how to use these metrics to create visualizations in our Hosted Grafana, here's an example of syslog and Loki performance logs as metrics:

From Logs to Metrics Part 1: Building an Open-Source Logs-to-Graphite Pipeline - 1

Conclusion

By completing this setup, you've built a powerful pipeline that transforms raw system logs into structured, real-time metrics, all using lightweight, open-source tools. Instead of sifting through endless log lines manually, you can now monitor key activities like SSH logins, cron jobs, and system events directly from your Graphite dashboards. This gives you instant visibility into system health without the complexity (or cost) of a full ELK stack.

In a DevOps role, having log observability isn't just nice to have, it’s crucial. Monitoring logs as metrics helps you spot failures faster, catch suspicious activity early, and automate your incident response. It empowers you to move from reactive troubleshooting to proactive system management. And best of all, this approach is lightweight enough to scale from a single server to an entire fleet without breaking your infrastructure budget.


Want to learn more? Reach out to us today and start a conversation with us. Happy monitoring!



You might also like other posts...
metricfire Apr 19, 2025 · 5 min read

Will it Monitor? Tracking the ISS in Real Time

Tracking the International Space Station (ISS) as it orbits Earth is not just a... Continue Reading

metricfire Apr 03, 2025 · 6 min read

Step by Step Guide for Using the HG-CLI Agent Installation Tool

At MetricFire, we’re committed to making infrastructure monitoring as seamless and accessible as possible.... Continue Reading

metricfire Mar 25, 2025 · 10 min read

Easiest Way to Monitor Your Java Application Using OpenTelemetry

When you're running a Java application, the JVM is doing a ton of work... Continue Reading

header image

We strive for 99.999% uptime

Because our system is your system.

14-day trial 14-day trial
No Credit Card Required No Credit Card Required