2. Deployment Architecture
2.1 Deployment and Headless Service for Master Nodes
2.2 Data Nodes Deployment
2.3 Client Nodes Deployment
2.4 Scaling Considerations
3. Deploying Kibana and ES-HQ
3.1 Kibana Deployment
3.2 ES-HQ Deployment
This is the first post of a 2 part series where we will set-up production grade Kubernetes logging for applications deployed in the cluster and the cluster itself. We will be using Elasticsearch as the logging backend for this. The Elasticsearch setup will be extremely scalable and fault tolerant.
Important things to keep in mind:
Let’s jump right at deploying these services to our GKE cluster.
Deploy the following manifest to create master nodes and the headless service:
If you follow the logs of any of the master-node pods, you will witness the master election among them. This is when the master-node pods choose which one is the leader of the group. When following the logs of the master-nodes, you will also see when new data and client nodes are added.
It can be seen above that the es-master pod named es-master-594b58b86c-bj7g7 was elected as the leader and the other 2 pods were added to the cluster.
The headless service named elasticsearch-discovery is set by default as an env variable in the docker image and is used for discovery among the nodes. This can of course be overridden.
We will use the following manifest to deploy Stateful Set and Headless Service for Data Nodes:
The headless service in the case of data nodes provides stable network identities to the nodes and also helps with the data transfer among them.
It is important to format the persistent volume before attaching it to the pod. This can be done by specifying the volume type when creating the storage class. We can also set a flag to allow volume expansion on the fly. More can be read about that here.
We will use the following manifest to create the Deployment and External Service for the Client Nodes:
The purpose of the service deployed here is to access the ES Cluster from outside the Kubernetes cluster but still internal to our subnet. The annotation “cloud.google.com/load-balancer-type: Internal” ensures this.
However, if the application reading/writing to our ES cluster is deployed within the cluster then the Elasticsearch service can be accessed by http://elasticsearch.elasticsearch:9200.
Once all components are deployed we should verify the following:
1. Elasticsearch deployment from inside the Kubernetes cluster using an Ubuntu container.
2. Elasticsearch deployment from outside the cluster using the GCP Internal Load balancer IP (in this case 10.9.120.8). When we check the health using curl http://10.9.120.8:9200/_cluster/health?pretty the output should be the same as above.
3. Anti-Affinity Rules for our ES-Pods:
We can deploy auto scalers for our client nodes depending on our CPU thresholds. A sample HPA for client node might look something like this:
Whenever the autoscaler kicks in, we can watch the new client-node pods being added to the cluster by observing the logs of any of the master-node pods.
In case of Data-Node Pods all we have to do is increase the number of replicas using the K8 Dashboard or GKE console. The newly created data node will be automatically added to the cluster and will start replicating data from other nodes.
Master-Node Pods do not require auto scaling as they only store cluster-state information. In case you want to add more data nodes make sure there is not an even number of master nodes in the cluster. Also, make sure the environment variable NUMBER_OF_MASTERS is updated accordingly.
The logs of the leading master pod clearly depict when each node gets added to the cluster. It is extremely useful in case of debugging issues.
Kibana is a simple tool to visualize ES-data and ES-HQ helps in the administration and monitoring of Elasticsearch clusters. For our Kibana and ES-HQ deployment we keep the following things in mind:
We will use the following manifest to create Kibana Deployment and Service:
We will use the following manifest to create ES-HQ Deployment and Service:
We can access both these services using the newly created Internal LoadBalancers.
Go to http://<External-Ip-Kibana-Service>/app/kibana#/home?_g=()
ElasticHQ Dashboard for Cluster Monitoring and Management
This concludes deploying ES backend for logging. The Elasticsearch we deployed can be used by other applications as well. The client nodes should scale automatically under high load and data nodes can be added by incrementing the replica count in the statefulset. We will also have to tweak a few env vars but it is fairly straightforward. In the next blog we will learn about deploying a Filebeat DaemonSet in order to send logs to the Elasticsearch backend.
MetricFire specializes in monitoring time-series, while Elasticsearch dominates the logs monitoring space. Try out the MetricFire product with our free trial to monitor your time-series data, or book a demo and talk to us directly about the monitoring solution that works for you. Stay tuned for part two :)
This article was written by our guest blogger Vaibhav Thakur. If you liked this article, check out his LinkedIn for more.