Logging for Kubernetes: fluentd and ElasticSearch

December 6, 2019

Table of Contents

1. Introduction and Useful Terminology

2. Kubernetes Logging Structure

3. Logs Collector: fluentd

4. Logging Endpoint: ElasticSearch

5. Full Stack Example

               5.1. Python

               5.2. Containers

               5.3. Kubernetes

               5.4. EFK:ES

               5.5. EFK fluentd

               5.6. Kibana

6. Conclusion


1. Introduction

This article will focus on using fluentd and ElasticSearch (ES) to log for Kubernetes (k8s). This article contains useful information about microservices architecture, containers, and logging. Additionally, we have shared code and concise explanations on how to implement it, so that you can use it when you start logging in your own apps. 

Useful Terminology 

Microservices architecture splits development functionality into autonomous subteams that allow for a more productive development process. Subteams only care about their specific tasks, so the resulting software (or part of it) is ‘containerized’. The most popular containerization tool, Docker, lets you make independent images with all necessary prerequisites and commands. Thanks to the images, you can have many different containers with unique environments built for special microservices.

Kubernetes (k8s) is the best tool to control, monitor, roll out, maintain and upgrade microservices. K8s is an open-source tool that helps manage multiple containers, applications, versions, etc. If you’re not familiar with k8s, you can read more about it here

Logging lets you control a node’s lifecycle and a pod’s communication; it’s like a journal of everything inside the app. You can analyze the log's data to gain insights into problems; gauge the efficiency of code snippets; and even compare the productivity of different software versions. Both fluentd and ElasticSearch are excellent tools that will facilitate the logging process, ensuring that your app runs smoothly. 

2. Kubernetes Logging Structure

There are three different levels for logging in Kubernetes: basic I/O logging, node-level logging, and cluster-level logging. 

First, we have the basic I/O logic. Every high-level programming language has functions that help print and write user data, so the developer can document all the information that they want from the snippet. k8s supports these basic data streams and retrieves them as logs. Two of these streams are stdout (everything that user will print) and stderr (each error text explanation). Results of such logging are accessed via kubectl logs CLI command, where the data is collected and you can easily explore it.

But what if you need to build one single log structure for a couple of apps, or even a dozen? And, the most terrifying question: what if a pod with an application crashes? Unfortunately, basic logging doesn’t answer these questions. Data is temporary; any crash or rerun will erase all the previous records.

To address these problems, you need to set it to the node-logging level. A node is an environment for multiple pods, containers, etc. We can try to universalize logging in a node via one special pod. Since we can interact with every single object within the node, we can try to make a vertical unique logging system. 

Data from each pod transacts to a single logs storage file--a JSON file--that allows us to work with reruns and sudden deaths, but tends to reach max capacity too quickly. Node-level logging only supports stdout/stderr types, and you still need to have separate logging settings for each node.  

Finally, we have the cluster-logging level. The concepts are the same as the other two levels, but k8s doesn’t have any built-in support for cluster-logging, so you have to create and/or integrate side components by yourself. 

Example of a cluster-level custom solution, Image source: Kubernetes docs

Although k8s doesn’t provide an instant solution, it supports tools for logging at the cluster-level. We will take a look at them in the following sections. For now, it is important to understand the most common approach: You can implement cluster-level logging by including a node-level logging agent on each node. A logging agent is a dedicated tool that exposes or pushes logs to a backend. Usually the logging agent is a container that has access to a directory with log files from all of the application containers on that node.

Because the logging agent must run on every node, it’s common to implement it as either a DaemonSet replica, a manifest pod, or a dedicated native process on the node.

Now that we covered the basics of logging, let’s explore fluentd and ElasticSearch, the two key products that can help with a logging task. 


3. Logs Collector: fluentd

Fluentd is an ideal solution as a unified logging layer. You just have to open and download the type of logger you need for the project. We will use the DaemonSet tool for Kubernetes which will collect the data from all nodes in the cluster.

When you use fluentd, snippets are ready, Docker image is stably updating, and you even have predefined ElasticSearch (ES) support. Moreover, fluentd has various endpoint receivers: ES, MongoDB, Hadoop, Amazon Web Services, Google Cloud Platform, etc.

Fluentd collects logs both from user applications and cluster components such as kube-apiserver and kube-scheduler, two special nodes in the k8s cluster. The main advantage of this approach is that data isn’t stored in the JSON file, so it is saved with no exclusions. Where exactly it is saved depends on the project needs.

Fluentd is not only useful for k8s: mobile and web app logs, HTTP, TCP, nginx and Apache, and even IoT devices can all be logged with fluentd. 


4. Logging Endpoint: ElasticSearch

As with fluentd, ElasticSearch (ES) can perform many tasks, all of them centered around searching. ES, developed and provided by Elastic company, is a rapid-fire queryset executor that has impressive data processing and transferring capabilities. ES is a part of EK stack for logging and data representing. The K stands for Kibana, a useful visualization software that helps with data demonstration.

There are two different concrete stacks for logging: ELK and EFK. The first is with Elastic domain product, Logstash. However, this tool, which has a lightning connection with ES, doesn’t support k8s directly. This is where the fluentd open-source project comes in handy: it uses Logstash in some preliminary steps of k8s integration, merging both stacks for the orchestration use case. 

So why should we choose ElasticSearch over other output resources? Well, many reasons. 

  • ES keeps data relationally with no difficult standard DBMS rules or constraints.
  • It has simple and powerful native add-ons like Kibana and Logstash.
  • It has a RESTful API interface, which is significantly better and easier to use than basic SQL language. 
  • It has multi-threaded architecture and strong analytical skills. 

Many leading companies, including GitHub, Netflix, and Amazon, use ES.

5. Full Stack Example

In this example, we will connect fluentd, ES and Kibana (as visualization tool) to make an exact namespace with a few services and pods. Also, we will test the namespace on a simple Python flask project.

We will make a Docker container with a Python 3.7 (and all the required side-modules). We will use the container as a source for our pods later on. 

To implement this tutorial successfully, you need to have the following stack on your PC:

  • kubectl: Kubernetes CLI interface
  • Minikube: local k8s cluster that emulates all workloads of the real enterprise clusters
  • Docker: tool for the containerization
  • Python: programming language; you also need to install virtualenv for making independent virtual environments.

Furthermore, for a kubectl, your PC should have a pre-installed hypervisor, a tool for virtual machine making. Or, if you have UNIX-based OS, you can also use a bare-metal option (Linux has its own KVM feature).


5.1. Python

Let’s start with Python. First, create a folder for the entire project to store all the required files. Next, create a subfolder for a Python app where you will develop a script, store the virtual environment, and make everything for Docker.

Our application will be a very simple ‘Simon says’ game: 

  • On the first page (or endpoint), the app will return ‘Simon says’ followed by the current time. 
  • If we go to the {host}/echo, this response will be repeated. 
  • Note that the message for repeating will be in the 'm' argument of the 'get' request. If there aren’t any arguments (specifically 'm'), the app will return a ‘Simon says’ phrase. 
  • To create a virtual environment, install Flask library (via pip install flask), and create .py file. 

<p>CODE: https://gist.github.com/denshirenji/4ef6dc079eac7559f65ce6deef19e09d.js</p>

The code is pretty simple, right? Take a look at the echo method. Here, we make our stderr log. It will be useful at the end when we check for the log's accuracy. 

Now we can test it. Run this file (via python <file-name>.py) and explore the result. The app and the requests should work.

Since we also want to make an independent image, we need to take care of all the requirements. As we have installed the side module flask, we may need to handle the Python configuration. For this purpose, run the following line in the terminal: pip freeze > requirements.txt. The line will create a file with all the required modules for our project. Now it’s time for the containers!


5.2. Containers

If you are using Docker, make a Dockerfile file, which helps with the container settings, and the dockerignore file, which points out which files and folders Docker should ignore.  

Let’s start with the Dockerfile (code follows below):

  • First, tell the Docker that our image will be on python 3.7.  
  • Copy all of the current folder in the /usr/src/app path of a future image. 
  • Next, set up a working directory (same as we mentioned in the previous line) to handle the requirements problem. 
  • Finally, expose 3001 port and say that all will start via python <file-name>.py command.

<p>CODE:https://gist.github.com/denshirenji/63668d120b89b865e9d08878fee4eeeb.js</p>

Now, create a .dockerignore file like the one below:

  • The first line is about the text editor (we used Visual Studio Code) 
  • Then, we say to ignore the virtual environment and Dockerfile, which we don’t need in the image

<p>CODE:https://gist.github.com/denshirenji/d9cf2189ff2f424632bf8109d90e031b.js</p>

Finally, open the terminal and make the Docker container.

<p>CODE:https://gist.github.com/denshirenji/6fd753b848e975e68be9c36a822bf3e5.js</p>

Once you make the Docker container, you will have made your fully independent Docker image. You should be able to see the result just by running it. Type the next command: docker run -p 3001:3001 log-simon:latest. After you type that command, you can check the app in the browser. The output should be exactly the same as what you got in the code. 

5.3. Kubernetes

Run a Minikube cluster (via minikube start command) and move to the root folder of our project. Create one more folder for the k8s files and move to it.

First, let’s separate all our experiments from basic and important k8s nodes. Type this line: kubectl create namespace logging. Namespace is a way to distinguish different k8s nodes.

Now we will make a few deployments for all the required resources: Docker image with Python, fluentd node (it will collect all logs from all the nodes in the cluster) DaemonSet, ES and Kibana. You can read more about .yaml files, k8s objects, and architecture here.

Create a deployment.yaml file and paste the following lines on it:

<p>CODE:https://gist.github.com/denshirenji/acc7f9df97a7a69c969628dcc55c6897.js</p>

Confused by the above lines? Don’t be. We just made two k8s entities. 

  • The first one is the service that will be responsible for the app interaction. It is combined with the app 'simon'. The second (after the --- line), is the simon app and its settings. 
  • We connected the deployment with the Docker image log-simon:latest and created two replicas. 
  • Finally, we made the app work via the next line in the terminal: kubectl create -f deployment.yaml -n logging
  • In the terminal, we created a deploy based on the deployment.yaml and dedicated it to the logging namespace. 

You can check the results by getting all pods and services.

<p>CODE:https://gist.github.com/denshirenji/40f8d5859b030c1c626d11d1cfb54557.js</p>

The output will look like this:

NAME                                      READY    STATUS      RESTARTS    AGE            IP                    NODE

simon-69dc8c64c4-6qx5h       1/1            Running       3                    4h36m        172.17.0.10    minikube

simon-69dc8c64c4-xzmwx      1/1            Running       3                    4h36m        172.17.0.4      minikube

Both replicas (or pods) of the 'simon' app work, and we can check them by using a service.

<p>CODE:https://gist.github.com/denshirenji/0769651f3bdea567970554bc124dc61e.js</p>

The output will look like this:

NAME              TYPE                   CLUSTER-IP      EXTERNAL-IP   PORT(S)                AGE          SELECTOR

simon-service   LoadBalancer     10.96.145.153     <pending>         3001:32023/TCP    4h40m      app=simon

You can follow the same procedure whenever there are new pods or services. 

Note: the column 'port(s)' explains where exactly your service is stored. To check this, run the minikube ip command and get the exact IP-address of the cluster. Then, add port (in our case 32023) and visit this page in the browser. The result should be the same as the last two times you checked: the app works!


5.4. EFK: ES

Now, let’s create our EFK stack. We will start from the backend storage engine, ElasticSearch. The scheme is similar to the previous code snippets: we will create one service and one pod. Create file elastic_search.yaml.

<p>CODE:https://gist.github.com/denshirenji/4e27081a45d1c6825041fb3b0d68b02d.js</p>

Here we also created the deployment, but instead of our own Docker image, we used: docker.elastic.co/elasticsearch/elasticsearch:7.4.1 

For the EFK stack, we created only one pod, delegated exact resource usage, and made a service for the node. Command kubectl create -f elastic_search.yaml -n logging created one more pod and service with an installed local ES cluster. 

You can check the code using: curl $(minikube ip):<your-ES-port>, where <your-ES-port> is a port of the ES service (in our case 32541). The response should be close to this:

<p>CODE:https://gist.github.com/denshirenji/a884d31dfd999e3ac19e01384a9853f6.js</p>

ES is ready! 


5.5. EFK: fluentd

According to the EFK abbreviation, fluentd is next. We need to create and apply two files. As our fluentd node needs to keep all the logs from the cluster, it has to be installed in the other namespace - kube-system. Furthermore, we need to grant the RBAC with a few accesses. Create fluent-rbac.yaml and fill it with this content:

<p>CODE:https://gist.github.com/denshirenji/2da3b6a9749278914d8c4bd9e7fe78d1.js</p>

In the above code, we granted a few clusterRole bindings and created a ServiceAccount

Next, we need to deploy fluentd from the original Docker image. Create fluentd.yaml file and paste the next lines:


<p>CODE:https://gist.github.com/denshirenji/7132e56df0e54f7fae9a387876ff5e94.js</p>

In the above lines, we created the DaemonSet tool, ensured some hostPath configuration, and determined possible usage of the fluentd. 

Now we can apply the two files. Execute the next two lines in a row: kubectl create -f fluentd-rbac.yaml and kubectl create -f fluentd.yaml. We can check the results in the pods of the kube-system namespace.

<p>CODE:https://gist.github.com/denshirenji/3f3985cfab6d92cf66b4d5d4593a17fe.js</p>

The output will look like this:

NAME                                             READY   STATUS  RESTARTS   AGE

coredns-5644d7b6d9-74sgc           1/1         Running    8                   5d8h

coredns-5644d7b6d9-fsm8x           1/1         Running    8                   5d8h

etcd-minikube                                  1/1        Running    2                   4h31m

fluentd-npcwf                                 1/1        Running   5                   14h

kube-addon-manager-minikube      1/1         Running    8                   5d8h

kube-apiserver-minikube                 1/1        Running    2                   4h31m

kube-controller-manager-minikube 1/1         Running    24                 2d3h

kube-proxy-nm269                          1/1         Running    8                   5d8h

kube-scheduler-minikube                1/1         Running    24                 5d8h

storage-provisioner                         1/1         Running    9                   5d8h

Fluentd is running! To check its connection to ES, use the following command:

kubectl logs fluentd-npcwf -n kube-system

If the output starts from the line Connection opened to Elasticsearch cluster => {:host=>"elasticsearch.logging", :port=>9200, :scheme=>"http"} then all is fine!


5.6. Kibana

Finally, we will use Kibana to make a visual representation of the logs. Create a kibana.yml file with the following lines:

<p>CODE:https://gist.github.com/denshirenji/596774e437548049ba10fbd2ce2ff912.js</p>

In this code, we built a deployment/service solution like we did previously with one key modification: spec.template.spec.env node. In this node, we point at the ES server with a previously defined port so our Kibana will connect to it. 

Next, exec kubectl create -f kibana.yaml -n logging, and we can finally check all stacks!

Find the port of the Kibana (via get pods command) and run it in the browser.

Basic Kibana window 

Here is our visualization heaven. Find Management in the left dropdown menu. 

Management window

Click Index Patterns under the Kibana block.

Page of index patterns

You will see some possible indexes, and you can create a new one. Type logstash in the textbox and click Next step.

Step 2 in creating

Now choose @timestamp in the dropdown box and click Create index pattern.

Results of creating index pattern

 

Our index logstash is ready, so now logs are collected by fluentd, sent to the ES, and taken by Kibana. Let’s check to see if everything is working: Click Discover on the left bar.

All logs window

And here it is! Smart logs storage with a strong visualization and tracking capabilities. Now, go to the service of the simon app and try to visit /echo endpoint with no args. After that, type “simon” in the search box of Kibana and click Update.

Search by simon keyword

This is our log of the simon application. If you expand this object, you’ll find the text that we wrote inside the code at the very beginning of this article.

Log from our python application

Logging structure with EFK stack is now ready and logs from all the clusters are collected and processed, so we can track and maintain our application much more easily. 

6. Conclusion

Logging is a very powerful and indispensable instrument in programming, especially if you need to control many factors, including application health. That’s why it is impossible to imagine the k8s cluster without logging architecture. 

Of course, k8s provides basic features for this purpose, but when your nodes crash or rerun, you risk losing invaluable information. Stacks that help handle and store logs independently become the part of the cluster as the base nodes, and this is where fluentd and ES can help. 

These tools not only allow users to control pod/node/namespace logs, but also to control all clusters. Additionally, ES and fluentd include strong visualization features like Kibana. ES and fluentd make data understandable, and they have dynamic filters and updates. You can set your stack once and not worry about losing your data. 

Check out more about other options at MetricFire's hosted Graphite pages.



Related Posts

GET FREE PROMETHEUS monitoring FOR 14 Days