AWS Elastic Container Service for Kubernetes (EKS) is a managed Kubernetes service ideal for large clusters of nodes running heavy and variable workloads. Because of the way account permissions work in AWS, EKS's architecture is unusual and creates some small differences in your monitoring strategy. Overall though, it's still the same Kubernetes you know and love.
Kubernetes consists of a control plane and a data plane. The control plane includes the services Kubernetes requires to manage the nodes, primarily kube-apiserver (used by kubectl amongst others), controller-manager and kube-scheduler. These usually run on one or more master node within the cluster. The master node can be replicated for fault tolerance, but the control plane components can also be deployed as pods within Kubernetes itself, which is what you’ll see when developing on minikube, for example.
The data plane is made up of server or VM nodes all running the kubelet service and kube-proxy, which allows them to respond to changes in the Kubernetes configuration. Kubelet also manages the Pod and Node APIs which drive container execution and, like all APIs in Kubernetes, are visible and available to serve as the foundation for tools and extensions.
All nodes in the cluster run etcd to allow the cluster state to be coordinated across the cluster. When a request is made to kube-apiserver that changes the state of the cluster, kube-apiserver updates the object in etcd, which is then propagated across the cluster. Kubelet then implements those changes on each node.
If the master node goes down, the group of nodes effectively ceases to be a cluster, but applications will usually continue to function even without a master. Restarting nodes may cause issues with DNS routing until the master is back up, but a short outage usually has no effect.
EKS is a managed Kubernetes service, which splits the responsibility for the two planes between your account and AWS. It consists of all the same moving parts as Kubernetes, so you have full control of the cluster via kubectl and the api-server, but the master nodes and control plane are managed and maintained by AWS on their own account. In addition, a cloud-controller-manager daemon is run to handle interactions with AWS resources. Configuration details are provided for your local kubectl to connect to the kube-apiserver, and the AWS account is given permission to access the instances on your account which act as nodes in your cluster.
In order to set this up there are a few requirements:
When you set up EKS you’ll be asked to provide an IAM role which can be used to grant access to the nodes from the AWS account; you’ll also be asked for a security group which grants access the kubernetes-specific ports; and you’ll be asked for a VPC (virtual private cloud) which will be used to provide static (internal) addresses which the pods and nodes can use to find each other.
The size of the VPC address space limits the number of pods you can run, so make sure to choose a large enough range – for example, a /24 subnet only provides 254 addresses, and each node in the cluster also needs an address as well which reduces the total available to pods. There’s also a limit to the number of IPs that can be assigned to a single node, based on the number of Network Interfaces and how many IPs each can sustain.
To simplify the process of spinning up and managing your EKS clusters, AWS provides a small command line utility called eksctl, which can use credentials stored by the awscli to create cluster nodes and roles on your behalf. It will automatically export the credentials you need to control your cluster to kubectl .
If you launch a Service, Kubernetes’ cloud-controller-manager will choose an appropriate AWS resource to launch depending on the service type. In EKS, NodePort or ClusterIP services functions as in a normal Kubernetes setup, but the LoadBalancer or Ingress service types both trigger the creation of AWS resources:
This page provides more information on how traffic is routed by AWS’s different load balancers.
Kubernetes supports a large number of storage types, the simplest of which is an emptyDir, a volume that persists for the lifetime of the Pod it’s attached to regardless of container restarts within the Pod. By default emptyDir volumes are created in the disk space of the node – in EKS, since the node is an EC2 instance, the actual type of storage used for emptyDir is (usually) an Elastic Block Storage volume. EC2 also offers ephemeral local storage known as Instance Store, however that’s not the default anymore due to its transient nature.
Kubernetes also supports using an EBS volume as storage directly, making it possible to create a volume that persists between pods. Optionally, you can even pre-load the EBS volume with data the pods need access to. EBS volumes can only be mounted to one EC2 instance at a time, which is required for the pod to access it.
In a normal Kubernetes setup, one of the most important things to monitor is the Master Node(s) since without those the Kubernetes cluster ceases to be a cluster. In EKS the master node and most of the control plane is managed by AWS, which means they’re out of your reach. It may be possible to get metrics about the cluster as a whole, but not the actual master node.
Since one of Kubernetes’ jobs is to deal with outages of nodes and containers, it has built in monitoring with can be accessed as part of your own monitoring setup. Kubelet is responsible for collecting data on the states of nodes, pods, and containers (cAdvisor is built in) to let it know when a container is having issues and needs to be restarted, and the metrics are available via the Metrics Server. metrics-server is an extremely lightweight service with short-term storage, so it’s used for capturing the current state of Kubernetes resources on request. Another tool is needed to capture the metrics as a time-series, and use them as a graph.
AWS already provides monitoring for EC2 instances, which is stored in CloudWatch. They also provide a service called Container Insights, which is an agent that can be run on your EKS cluster as a DaemonSet, to get metrics about nodes, pods and containers and send them back to CloudWatch as custom metrics.
Custom metrics have a relatively steep cost in CloudWatch though – $0.30 per metric before alerting is added in. A more common approach is to use Prometheus to monitor the cluster. It’s easy to get up and running since Kubernetes supports Prom-format metrics natively, and optionally you can set up Helm, a Kubernetes package manager, which will install and configure not just Prometheus but all the supporting services, including Alert Manager, PushGateway and Node-Exporter.
Since AWS CloudWatch provides an API access metrics, there’s a middle ground in the form of the Prometheus CloudWatch exporter, which gets metrics from CloudWatch and makes them available to Prometheus. You can install that using Helm or directly from github.
With Prometheus, there is a choice to make about where to keep the metric data being retrieved. An emptyDir-type storage volume (described above) which lives as long as the Prometheus pod makes sense, but it doesn’t provide resilience. A separate EBS volume is a safe option since it exists separate to the node and containers, and it can even be increased in size if needed. There is a minimum size though, so if you don’t need all that storage it could be wasted. Remote storage provides resiliency and may allow alerts and dashboards to be created outside of the cluster as well (MetricFire’s Hosted Prometheus service for example).
EKS is a service best suited to large cross-AZ deployments, and that’s what Kubernetes does best. Monitoring is essential still though because it is very easy to spend money on costly EC2 instances that you don’t need. Making sure the capacity of the cluster matches your resource consumption is vital, and good monitoring can help ensure you achieve that.
Want to talk about monitoring best practices and any problems you're having? Get in touch: firstname.lastname@example.org – we offer both Prometheus as-a-service and Graphite as-a-service, with a 14-day free trial. You can also book a demo and talk to us directly about monitoring solutions that work for you.