Kubernetes Incident Response: 5 Metrics to Watch

KUBERNETES

Aug 23, 2024 ∙ 8 min read

MetricFire Blogger

Table of Contents

What Is Incident Response?
Incident Response for Kubernetes and the Kubernetes Audit Log
Detecting Common Kubernetes Attacks
Conclusion

Kubernetes is a central part of modern IT infrastructure. Like any critical system, it is becoming a valuable target for attackers. To identify and respond to security threats, teams need metrics that indicate anomalous activity and can indicate a direction for investigation.

In this article, I’ll review the basics of incident response, show the mechanisms that can provide security-related metrics in Kubernetes, and cover five common attacks and how you can detect and prevent them using Kubernetes metrics.

If you're looking for Kubernetes monitoring, check out MetricFire's Kubernetes monitoring resources. You can completely offload your Kubernetes monitoring to MetricFire's Hosted Graphite and easily keep metrics in long-term storage. Grab the free trial here.

Then, to quickly get started with monitoring Kubernetes clusters, check out our tutorial on using the Telegraf agent as a Daemonset to forward node/pod metrics to a data source and use that data to create custom dashboards and alerts.

What Is Incident Response?

Incident response is an organized set of actions taken to address security incidents and manage their aftermath. The objective is to respond to a breach quickly and effectively in order to minimize the damage inflicted on the affected system and reduce the time and cost of recovering operations.

Incident response tasks should ideally be performed by an organization's computer security incident response team (CSIRT). This dedicated team includes both general IT staff and specialized security staff—there may also be legal advisors, human resources personnel, and representatives from the public relations department.

The incident response team uses the guidelines set out in an incident response plan (IRP), which the organization provides to instruct how the team should respond to various events, including security incidents and confirmed breaches.

The purpose of incident response is to have an emergency plan in place before you need it. It should be considered a collaborative effort rather than an IT-based process, enabling the organization to make quick decisions based on reliable information. Thus, representatives from various aspects of the business join the technical IT and security staff.

Incident Response for Kubernetes and the Kubernetes Audit Log

In order to detect and investigate attacks on production Kubernetes infrastructure, you need logs. You can use the built-in Kubernetes Audit Log feature to obtain most of the data you’ll need to identify malicious activity in your cluster.

Kubernetes Audit Logging is a cluster-level capability that must be turned on for production clusters. The audit log records call to the Kubernetes API server in chronological order. Kubernetes audit log entries can help incident response teams investigate suspicious API requests, gather statistics, or generate monitoring alerts for unwanted API calls.

Audit policies define rules for which events should be logged and what data should be included. The structure of the audit policy object is defined in the audit.k8s.io API group. When processed, events are compared sequentially against the list of rules. You can define audit policies to log activities in your cluster that have security significance, including:

Creation of pods
Auto Scaling
Login and authentication events
Ingress and egress traffic
Changes to the cluster configuration
Unusual loads which could indicate denial of service (DoS)

Detecting Common Kubernetes Attacks

Anonymous Access

In most Kubernetes versions (except 1.5.1-1.5.x), API calls from anonymous users are accepted by default. Users authenticate with the API server via a password or token, but if the user doesn’t authenticate, the API server still accepts it—the user is assigned a “system:anonymous” username and a “system:unauthenticated” group. The same applies to any API request sent to the kubelet.

This is a severe misconfiguration because it allows malicious actors to send direct commands to Kubernetes components via the API. You should disable anonymous access and enable role-based access control (RBAC) to require all requests to be authorized. At the same time, monitor for this type of attack so you can stop it in its tracks.

Metrics to watch: check logs for a user object with username system:anonymous and group system:authenticated.

Service Accounts Compromise

There are two types of user accounts in Kubernetes—regular users, which are managed externally, and service accounts, which are managed by Kubernetes. Kubernetes automatically creates service accounts as needed, but you can also add custom service accounts. Service accounts are assigned to each pod in addition to the main Kubernetes services.

Attackers can easily compromise a service account once they have breached a pod, allowing them to hijack the account’s permissions and authenticate to the API server. There is no universal misuse signature, but you can detect anomalous behavior. Service accounts have specific purposes, so they should have defined and consistent activities. A compromised account will likely behave differently, allowing you to identify malicious activity.

Metrics to watch: audit log of API requests from pods. It is difficult to identify anomalies manually, especially in large clusters. You can apply tools like User and Entity Behavioral Analytics (UEBA) to analyze log streams for anomalous behavior.

Node Compromise

Pods are not a stable vector for persistent attacks, as they are ephemeral and often deleted and recreated according to the cluster's changing needs. However, attackers can target the underlying node hosting the pods, which is more stable.

If attackers can create rights for a pod resource, for instance, via hostPath, they can configure pods to mount a node’s root directory. The API server logs the creation of new pods but not configuration details, so there will be no indication if an attacker mounts a node’s file system.

You can only detect this type of attack by monitoring the behavior around the pod’s creation and identifying anomalies. For instance, if a pod creates another pod, this would be flagged as suspicious, given that pods are usually created by a Kubernetes Admission Controller. Likewise, the API call would likely come from an account that doesn’t usually develop pods.

Metrics to watch: check audit logs for pod creation commands and raise an alert if pods are created by a service account that is not authorized to create them.

API Authentication Compromise

When a request is sent to the Kubernetes API, it is either authorized or denied according to the configured authorization module (i.e. RBAC) that verifies if the action is permitted for the user. The following authentication responses are significant for incident response.

403 Forbidden

If a client tries to perform actions on a cluster without the right permissions, the API server produces a 403 Forbidden response. It is crucial to identify where the requests originated from. An increase in this type of response can indicate a security issue, or may indicate misconfigured RBAC policies.

401 Unauthorized

If a user or service account can’t be authenticated, the API server issues a 401 Unauthorized response. Monitor 401 audit logs to detect authentication issues, such as expired certificates and malformed tokens.

Metrics to watch:

Monitor the average rate of 403 responses, and watch for an increase in these responses. This may indicate an attack, or legitimate users who are unable to access the resources they need.
Group 401 responses by host to view where requests originate, identify potential issues with certificate renewal, or indications of an attack on a particular host.

Conclusion

In this article, I covered the basics of Kubernetes incident response and showed how to detect five common attacks using readily available metrics:

Anonymous access—This should be disabled for the API server, and it creates severe security issues if it is not. You can monitor for anonymous access by alerting on logs with a user object with the username system:anonymous.
Service accounts compromise - service accounts have privileged access to cluster capabilities and are valuable for attackers. To identify compromised service attacks, alert on anomalous activity in the audit log of pod API requests.
Node compromise - attackers can take control of a node in the Kubernetes cluster, and can assume the privileges of that node to create malicious pods. To detect this attack, raise an alert when pod creation commands are created by an unauthorized service account.
API authentication compromise - attackers may try to gain unauthorized access to user accounts in the API server. To detect this attack, monitor the rate of 403 responses and group 401 responses by the host to see where unauthorized requests originated.