Kubernetesの監視 - Telegrafを使用するベストな監視方法

インフラ

Sep 04, 2024 ∙ 14 min read

Elliot Langston

Great systems are not just built. They are monitored.

MetricFire runs Graphite and Grafana as a fully managed service for growing engineering teams, taking care of storage, scaling, and version updates so your team doesn't have to. Plans start at $19/month, billed per metric namespace rather than per host, and include engineer-staffed support. Integrations work natively with Heroku, AWS, Azure, and GCP, and data is stored with 3× redundancy in SOC2- and ISO:27001-certified data centres.

はじめに
前提条件
Telegraf デーモンセットファイル構造の作成
Telegraf デーモンセットをデプロイ
メトリックの検索と可視化
Graphite アラートの設定
最後に

はじめに

このチュートリアルでは、Telegrafエージェントを使用して、Kubernetesクラスタを簡単に監視できるようにする方法を解説いたします。

方法としては、ノード/ポッドのメトリクスをデータソースに転送し、カスタムダッシュボードとアラートを作成するためにそのデータを使用するDaemonsetとしてTelegrafエージェントを使用することとなります。ほんの数分で実現できるようになりますので、以下の方法をぜひお試しください。

Kubernetesの監視 - Telegrafを使用するベストな監視方法 - 1

Kubernetesは、コンテナ化されたアプリケーションのデプロイ、スケーリング、管理を自動化し、分散システム全体で高可用性と一貫したパフォーマンスを確保するために、本番レベルのアプリケーションやソフトウェアサービスにおいて使用されています。

Kubernetesは、ロードバランシング、セルフヒーリング、ローリングアップデートによって信頼性を高め、クラウドネイティブ環境やハイブリッド環境における効率的なリソース利用とオーケストレーションを可能にします。ここ数年で人気が高まっており、世界中のソフトウェアインフラを管理するための貴重なプラットフォームとなっています。

Kubernetesクラスターを監視することは、問題を早期に検出し、迅速な解決を可能にすることで、リソースの使用を最適化し、アプリケーションのパフォーマンスを確保するために極めて重要です。また、異常なアクティビティを可視化することでセキュリティを強化し、規制基準の遵守にも役立ちます。さらに、モニタリングはコスト管理とキャパシティプランニングを支援し、ダイナミックな環境におけるサービスの信頼性と可用性を担保することになります。

前提条件

この記事では、アクティブなKubernetesクラスタにアクセスできることを前提としています。そうでない場合は、Docker Desktop経由でminikubeのようなツールを使用するか、AWS EKSやLinodeのようなマネージドサービスを使用してテストクラスタをスピンアップすることができます。この記事では、Linodeでホストされているクラスタからメトリクスを収集しています。

次に、メトリクスを転送するデータソースも必要です。無料で簡単にデータソースを設定するには、Hosted GraphiteとGrafanaサービスを提供するMetricFireのトライアルにサインアップしてみてください。トライアルアカウントにサインアップすると、APIキーが発行されます。

最後に、TelegrafをDaemonsetとしてクラスタにデプロイします。つまり、コマンドラインにkubectlツールをインストールし、~/.kube/configファイルにクラスタコンテキストと認証情報（certificate-authority-data、token）を適切に設定しておく必要があります。

k8クラスタが実行され、~/.kube/configファイルがCLIアクセスを許可するように設定されたら、コンテキストを設定できます：

kubectl config get-contexts
kubectl config use-context <context-name>

Telegraf デーモンセットファイル構造の作成

デーモンセットのデプロイは一般的にYAMLファイルの構成によって管理され、Helmチャートを使用することで、これらのファイルを自動的に簡単に作成できるため、人気があります - フレームワークのボイラープレートのようなものです。しかし、Helmチャートの複雑さが必要ない場合、Kustomizeツールはkubectlにすでに組み込まれているので、デプロイメントを管理するための優れた選択肢になります。以下では、Kustomizeコマンドラインツールを使ってTelegrafエージェントをデーモンセットとしてデプロイするための基本的なファイル構造を詳しく説明します。ローカルマシンに各ディレクトリ/ファイルを手動で作成するか、MetricFire GitHubからパブリックリポジトリをクローンするだけです。

プロジェクトディレクトリ:

telegraf-daemonset/

├── kustomization.yaml

└── resources/

├── config.yaml

├── daemonset.yaml

├── namespace.yaml

├── role.yaml

├── role-binding.yaml

└── service_account.yaml

kustomization.yaml： このファイルはオーケストレーターとして機能し、他のすべてのYAMLファイルを結びつけ、追加の設定やパッチを適用します。このファイルはデプロイが一貫しており、環境間で再現可能であることを保証します。

apiVersion: kustomize.config.k8s.io/v1beta1

kind: Kustomization

namespace: monitoring

resources:

- resources/config.yaml

- resources/daemonset.yaml

- resources/namespace.yaml

- resources/role-binding.yaml

- resources/role.yaml

- resources/service_account.yaml

resources/config.yaml：このファイルには、DaemonSetやKubernetesリソースが必要とする設定データが含まれています。Telegrafの場合、通常、入出力プラグインとその設定が含まれます。KubernetesのConfigMapとして使用され、DaemonSetに設定データを提供します。デフォルトの'パフォーマンスコレクションプラグインに加え、inputs.kubernetesプラグインとoutputs.graphiteプラグインを設定します。これはHosted Graphiteトライアルアカウントにデータを転送します（このファイルにHG APIキーを追加してください）。

apiVersion: v1

kind: ConfigMap

metadata:

data:

telegraf.conf: |

[agent]

hostname = "$HOSTNAME"

interval = "10s"

round_interval = true

[[inputs.cpu]]

percpu = false ## setting to 'false' limits the number of cpu metrics returned

[[inputs.disk]]

ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"]

# [[inputs.diskio]] ## commented out to limit the number of metrics returned

[[inputs.mem]]

[[inputs.system]]

[[outputs.graphite]]

servers = ["carbon.hostedgraphite.com:2003"]

prefix = "<YOUR-HG-API-KEY>.telegraf-k8"

[[inputs.kubernetes]]

url = "https://$HOSTIP:10250"

bearer_token = "/var/run/secrets/kubernetes.io/serviceaccount/token"

insecure_skip_verify = true

resources/daemonset.yaml：このファイルはKubernetesのDaemonSetリソースを定義します。DaemonSetは、クラスタ内のすべての（または一部の）ノードでPodのコピーが実行されるようにします。コンテナイメージ、リソース制限、ボリュームなど、Podテンプレートの仕様が含まれています。

apiVersion: apps/v1

kind: DaemonSet

metadata:

spec:

selector:

matchLabels:

template:

metadata:

labels:

spec:

serviceAccountName: telegraf-sa

containers:

- name: telegraf

image: telegraf:latest

resources:

limits:

memory: 200Mi

cpu: 200m

requests:

memory: 100Mi

cpu: 100m

volumeMounts:

- name: config

mountPath: /etc/telegraf/telegraf.conf

subPath: telegraf.conf

- name: docker-socket

mountPath: /var/run/docker.sock

- name: varlibdockercontainers

mountPath: /var/lib/docker/containers

readOnly: true

- name: hostfsro

mountPath: /hostfs

readOnly: true

env:

# This pulls HOSTNAME from the node, not the pod.

- name: HOSTNAME

valueFrom:

fieldRef:

fieldPath: spec.nodeName

# In test clusters where hostnames are resolved in /etc/hosts on each node,

# the HOSTNAME is not resolvable from inside containers

# So inject the host IP as well

- name: HOSTIP

valueFrom:

fieldRef:

fieldPath: status.hostIP

# Mount the host filesystem and set the appropriate env variables.

# ref: https://github.com/influxdata/telegraf/blob/master/docs/FAQ.md

# HOST_PROC is required by the cpu, disk, mem, input plugins

- name: "HOST_PROC"

value: "/hostfs/proc"

# HOST_SYS is required by the diskio plugin

- name: "HOST_SYS"

value: "/hostfs/sys"

- name: "HOST_MOUNT_PREFIX"

value: "/hostfs"

volumes:

- name: hostfsro

hostPath:

path: /

- name: config

configMap:

- name: docker-socket

hostPath:

path: /var/run/docker.sock

Total Servers to monitor ~150 metrics per host (configurable for fewer metrics if needed) Cloud Services to monitor (in AWS, Azure, GCP)

~25 metrics per service / instance (typical baseline monitoring) Application / Custom metric event footprint Custom metrics are defined and emitted from your app code Heroku Applications ~75 metrics (typical baseline monitoring)

Start your free trial

Kubernetesの監視 - Telegrafを使用するベストな監視方法

Great systems are not just built. They are monitored.

Infrastructure Monitoring Assessment

Graphite vs. New Relic　〜監視ツールの比較〜

We strive for 99.95% uptime

Try MetricFire now!

Add Hosted Graphite to your Heroku app for dashboards, alerts, and insights in minutes.

Great systems are not just built. They are monitored.

Graphite vs. New Relic 〜監視ツールの比較〜

We strive for 99.95% uptime

Try MetricFire now!

Add Hosted Graphite to your Heroku app for dashboards, alerts, and insights in minutes.

Graphite vs. New Relic　〜監視ツールの比較〜