[Prometheus] + Grafana 초간단 설치

728x90

본 글에서는 Prometheus와 grafana를 설치하고 대시보드를 통해 간단하게 메트릭 수집을 확인하는 실습을 다룬다.

Prometheus와 Grafana는 Helm 차트("prometheus-community.github.io")를 사용하여 Kubernetes 환경에서 간단하게 설치할 수 있다.

(Grafana만 따로 설치하고 싶다면 참고 : [Grafana 설치] )

설치를 완료하면 아래와 같은 Pod가 설치되는데, 각각의 역할은 다음과 같다.

alertmanager
Prometheus에서 수집된 경고 알림을 관리하고 전송하는 기능을 담당한다.
grafana
Grafana 서버로, Prometheus에서 수집된 데이터를 시각화하는 대시보드를 제공하는 서버이다.
kube-state-metrics
Kubernetes에서 제공하는 다양한 리소스의 상태를 수집하는 Prometheus exporter이다.
prometheus-operator
Kubernetes 클러스터에서 Prometheus를 관리하고 구성하는 데 사용되는 도구이다.
node-exporter
각 노드에서 실행 중인 시스템의 메트릭 데이터를 수집하는 Prometheus exporter이다.
prometheus
Prometheus 서버로, 수집된 메트릭 데이터를 저장하고 쿼리 하는 기능을 담당한다. 이를 위해 다양한 exporter와 함께 사용된다.

kube-state-metrics은 쿠버네티스 버전 호환성에 따라 메트릭을 수집을 못할 수 있으니 참고하자.

kube-state-metrics	Kubernetes client-go Version
v2.4.2	v1.23
v2.5.0	v1.24
v2.6.0	v1.24
v2.7.0	v1.25
v2.8.1	v1.26

만약 버전이 안 맞으면 아래와 같은 오류를 볼 수 있다.

- failed to list *v2.HorizontalPodAutoscaler: the server could not find the requested resource
- Failed to watch *v2.HorizontalPodAutoscaler: failed to list *v2.HorizontalPodAutoscaler: the server could not find the requested resource

오류 내용에 대해 간략하게 설명하자면, 최신 kube-state-metrics 버전(v2.4.2 이상)은 autoscaling/v2 api를 사용하는데, 해당 api는 쿠버네티스 1.23 버전부터 사용할 수 있다. 즉, 쿠버네티스 1.23 버전 미만에서는 사용할 수 없기 때문에 발생하는 오류 메세지 이다.

전제 조건

AWS EKS 클러스터
Helm CLI 도구

실습 환경

EKS 1.22.17
Helm 3.8.2

설치 버전

kube-prometheus-stack helm chart 버전 : 45.20.0
Grafana 버전 : 9.4.7

Prometheus Helm Chart를 등록한다.

$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

Prometheus Helm values.yaml 파일을 아래와 같이 작성한다.

## prometheus-values.yaml
grafana:
  enabled: true
prometheus:
  enabled: true
  prometheusSpec:  
    serviceMonitorSelectorNilUsesHelmValues: false  
    replicas: 1  
    resources:
      limits:
        cpu: 2000m
        memory: 4Gi
      requests:
        cpu: 2000m
        memory: 1Gi
    retention: 30d             ## 기본 10d
    retentionSize: 20GiB
    scrapeInterval: 15s        ## 기본 30s
    storageSpec:
      volumeClaimTemplate:
        spec:
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: 30Gi
          storageClassName: gp2

보관주기는 30일, scrape 주기는 15초로 설정했다.

[참고]
serviceMonitorSelectorNilUsesHelmValues: false
위 옵션의 기본값은 true이다. 해당 값이 true면 prometheus라는 CRD에서 아래와 같이 설정된다.
---
serviceMonitorNamespaceSelector: {},
serviceMonitorSelector:
matchLabels:
prometheus: release
---
이는 Prometheus가 추가 스크랩할 Targets을 serviceMonitor CRD로 지정할 때 serviceMonitor의 labels에 prometheus: release라는 label이 있어야 Targets 대상에 포함된다는 의미이다. 즉, serviceMonitor마다 해당 labels을 넣어줘야 한다.

해당 값을 false로 입력하면 아래와 같이 값이 바뀐다.
---
serviceMonitorNamespaceSelector: {},
serviceMonitorSelector: {}
---
위처럼 설정되면 serviceMonitor를 생성할 때 label에 상관없이 생성하여 사용할 수 있다.

[ServiceMonitor 예시]

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: podinfo
  namespace: test
  labels:
    release: prometheus  ## serviceMonitorSelectorNilUsesHelmValues값을 false로 하면 해당 labels은 안넣어줘도 된다.
spec:
  endpoints:
    - path: /metrics
      port: http        ## 반드시 숫자가 아닌 string 값으로 들어가야 한다.
      interval: 5s
  selector:
    matchLabels:
      app: podinfo
---
apiVersion: v1
kind: Service
metadata:
  name: podinfo
  namespace: test
  labels:
    app: podinfo
spec:
  type: ClusterIP
  ports:
    - port: 9898
      name: http        ## 위 ServiceMonitor가 선택할 Port의 이름으로 반드시 지정해 줘야 한다.
  selector:
    app: podinfo

이어서 위에서 생성한 Helm values.yaml 파일을 사용하여 설치한다.

$ kubectl create ns monitoring
$ helm upgrade -i prometheus -f prometheus-values.yaml prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--set fullnameOverride=prometheus

정상 설치되었는지 확인해 보자.

$ kubectl get po -n monitoring 
NAME                                                     READY   STATUS    RESTARTS      AGE
alertmanager-prometheus-kube-prometheus-alertmanager-0   2/2     Running   1 (12s ago)   13s
prometheus-grafana-749c756979-dm9tm                      3/3     Running   0             58m
prometheus-kube-prometheus-operator-fcbb6587b-kdsgl      1/1     Running   0             13s
prometheus-kube-state-metrics-77dbb5cf79-dn8vl           1/1     Running   0             58m
prometheus-prometheus-kube-prometheus-prometheus-0       1/2     Running   0             13s
prometheus-prometheus-node-exporter-76nb5                1/1     Running   0             58m
prometheus-prometheus-node-exporter-gbnlt                1/1     Running   0             58m
prometheus-prometheus-node-exporter-pspt7                1/1     Running   0             58m
prometheus-prometheus-node-exporter-z5tb6                1/1     Running   0             58m
prometheus-prometheus-kube-prometheus-prometheus-0       2/2     Running   0             16s

kubectl port-forward를 통해 대시보드에 접근한다.

$ kubectl port-forward --address=0.0.0.0 -n monitoring prometheus-prometheuprometheus-0 8080:9090

메트릭 정보를 잘 가져오는지 확인해 보자. 간단하게 pod와 노드 개수를 확인해 보겠다.

$ kubectl get po --no-headers -A | wc -l
65

그라파나 대시보드에도 접근해 보자.

초기 ID, Passwd를 아래 명령으로 확인한다.

## ID
$ kubectl get secrets -n monitoring prometheus-grafana -ojsonpath="{.data.admin-user}" | base64 -d

## PASSWD
$ kubectl get secrets -n monitoring prometheus-grafana -ojsonpath="{.data.admin-password}" | base64 -d

kubectl port-forward를 사용하여 Grafana 대시보드에 접근한다.

$ kubectl port-forward --address=0.0.0.0 -n monitoring svc/prometheus-grafana 8080:80

위에서 추출한 ID, PASSWD를 입력하고 대시보드에 로그인한다.

기본적으로 여러 Dashboards를 제공하는 것을 확인할 수 있다.

마치며

사용자가 보기 편하게 대시보드를 커스텀 하게 만들어 운영하면 가시성을 높일 수 있고, Alerting 기능을 설정하여 문제가 발생할 경우 사용자에게 즉각적으로 알림이 가게 하여 실시간 대응이 가능하게 된다. 이렇게 해당 도구를 잘 활용한다면 보다 안정적으로 운영할 수 있게 될 것이다.

'Observability > Prometheus & Grafana' 카테고리의 다른 글

[Prometheus] HA 구성 2 (With 샤딩 + Thanos) (3)	2023.05.10
[Prometheus] 란? (0)	2023.05.10
[Grafana] + AWS CloudWatch를 이용한 AWS 모니터링 (0)	2023.05.09
[Grafana] 대시보드 Variables 활용하기 (0)	2023.05.04
[Prometheus] HA 구성 1 (with Thanos) (4)	2023.05.03

IT DevOps 기록

[Prometheus] + Grafana 초간단 설치

전제 조건

실습 환경

설치 버전

마치며

'Observability > Prometheus & Grafana' 카테고리의 다른 글

댓글

티스토리툴바

[Prometheus] + Grafana 초간단 설치

전제 조건

실습 환경

설치 버전

마치며

'Observability > Prometheus & Grafana' 카테고리의 다른 글

관련글

댓글

티스토리툴바