How To Monitor Kubernetes With Prometheus

Written by Nick Otter.

Contents

Introduction

Let’s take a look at Prometheus as a monitoring solution for a simple cluster. The image below is a nice overview of Prometheus from Sysdig.com. In this article we’ll figure out how to deploy Prometheus using Helm, how to expose the web dashboards to outside of the cluster, take a quick look at the Prometheus web UI and how to expose a Traefik ingress resource to be polled by Prometheus. Nice.

I don’t like presenting images without info. So.. take a look over the image and talk it over. Only if you want to though.

Requirements

Kubernetes 1.19.2
Minikube 1.13.1
Helm 3.4.0
Prometheus kube-prometheus-stack-12.2.0

Deploying Prometheus with Helm

I used the chart kube-prometheus-stack for this (it’s the currently maintained chart at the time of writing this 20112020).

Let’s deploy Prometheus under the name prometheus in the Kubernetes namespace monitoring using helm.

$ helm install prometheus prometheus-community/kube-prometheus-stack -n monitoring
[minikube@control-plane helpers]$ kubectl get po -n monitoring
NAME                                                   READY   STATUS    RESTARTS   AGE
prometheus-grafana-58cf5655dc-sq8rs                    2/2     Running   0          75m
prometheus-kube-prometheus-operator-648b6f79cd-6v2f5   1/1     Running   0          75m
prometheus-kube-state-metrics-95d956569-4x88x          1/1     Running   0          75m
prometheus-prometheus-kube-prometheus-prometheus-0     2/2     Running   1          73m
prometheus-prometheus-node-exporter-x2sdt              1/1     Running   0          75m

Hello Prometheus. Grafana has also been installed which will be helpful for visualisation later.

Helm values.yaml

Custom configuration can be deployed using the values.yaml file.

e.g. if you want to do an install without Grafana? Easy. Just pass this values.yaml file into the build.

## Using default values from https://github.com/grafana/helm-charts/blob/main/charts/grafana/values.yaml
##
grafana:
  ## Deploy Grafana
  ##
  enabled: false
$ helm install prometheus prometheus-community/kube-prometheus-stack -n monitoring -f values.yaml

Redployments can be handled nicely with helm upgrade too.

$ helm upgrade --install --namespace monitoring prometheus stable/prometheus-operator -f values.yaml

Exposing the web dashboards

To expose the Prometheus or Grafana web dashboards, a couple of solutions could be used.

NodePort and port-forward can expose a Kubernetes Service resource outside of the cluster via the kube proxy. Ingress is a comprehensive load balancing solution, great for a production environment but a bit overkill for now.

port-forward is what we’re going to use. It allows use to access a service quickly, without permanently exposing it. This is great from a security perspective and for debugging too.

Let’s take a look at the Kubernetes Service resources that have been deployed with that install.

[minikube@control-plane helpers]$ kubectl get svc -n monitoring
NAME                                    TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)        AGE
prometheus-grafana                      NodePort    10.96.7.160      <none>        80:30080/TCP   98m
prometheus-kube-prometheus-operator     ClusterIP   10.96.18.46      <none>        443/TCP        98m
prometheus-kube-prometheus-prometheus   ClusterIP   10.100.249.35    <none>        9090/TCP       98m
prometheus-kube-state-metrics           ClusterIP   10.104.133.177   <none>        8080/TCP       98m
prometheus-operated                     ClusterIP   None             <none>        9090/TCP       97m
prometheus-prometheus-node-exporter     ClusterIP   10.103.119.94    <none>        9100/TCP       98m

Which one shall we forward? Naming conventions are a little funky but the 9090 ports look promising. Let’s try one and forward it to outside of the cluster so we can reach it on our server.

[minikube@control-plane kube-system]$ kubectl port-forward svc/prometheus-kube-prometheus-prometheus 9090 -n monitoring
Forwarding from 127.0.0.1:9090 -> 9090
Forwarding from [::1]:9090 -> 9090

Good so that sessions started. In another Shell session let’s see if we can reach that address outside of the cluster.

minikube@control-plane helpers]$ curl 127.0.0.1:9090
<a href="/graph">Found</a>.

Great. If there were any issues we could debug from the trace of the port-forward session and look at the logs of the actual pod too.

Looking at Targets in the Web UI

Ok, so now what? How is discovery configured with Prometheus? Let’s head to our Prometheus dashboard and look at Targets.

Focusing on Unhealthy targets, we’re seeing an error which helps to understand Prometheus metric discovery.

Get "http://172.17.0.2:10252/metrics": dial tcp 172.17.0.2:10252: connect: connection refused

Prometheus metric discovery is a Get - it’s a pull request. But it isn’t a pull request to specific pods. It is a pull request to their Service resource. It ‘scrapes’ services to get metrics. In this case, the service prometheus-kube-prometheus-kube-controller-manager in the namespace monitoring doesn’t exist. Let’s ignore this error for now.

Exposing metrics for Prometheus

Let’s keep this short and sweet. To add services to Prometheus, this is what’s required.

Remember, Prometheus scrapes Kubernetes Services not Pods. A diagram from Sysdig.com will make this easier to understand I’m sure and we’ll walk through the running order shortly.

Not in this diagram is that whole namespace thing I mentioned. A ServiceMonitor resource has to be deployed in the same namespace as the Prometheus pod (monitoring in our case) but that ServiceMonitor resource can expose services in all other namespaces to Prometheus.

I’m not going to deep dive into Prometheus Cluster Resource Discovery (CRD) but you can read more about it here and look at the CRD for your estate with kubectl get crd.

Configuring the metrics endpoint on a pod

Taking our Traefik ingress controller as an example. Let’s configure the metrics endpoint. The app supports exposing metrics for Prometheus, so it’s just a question of passing those args into the build and redploying. This should do the trick.

# traefik-deployment.yaml

...
    spec:
      serviceAccountName: traefik
      terminationGracePeriodSeconds: 60
      containers:
      - image: traefik:v1.7-alpine
        name: traefik
        ports:
        - name: app-services
          containerPort: 80
        - name: dashboard
          containerPort: 8080
        args:
        - "--api"
        - "--kubernetes"
        - "--logLevel=INFO"
        - "--metrics"
        - "--metrics.prometheus.buckets=0.1,0.3,1.2,5.0"

Now let’s check if the endpoint exists. First let’s find the Traefik Service.

[minikube@control-plane helpers]$ kubectl get svc -n kube-system
NAME                                                 TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                        AGE
kube-dns                                             ClusterIP   10.96.0.10      <none>        53/UDP,53/TCP,9153/TCP         51d
metrics-server                                       ClusterIP   10.96.133.225   <none>        443/TCP                        31d
prometheus-kube-prometheus-coredns                   ClusterIP   None            <none>        9153/TCP                       143m
prometheus-kube-prometheus-kube-controller-manager   ClusterIP   None            <none>        10252/TCP                      143m
prometheus-kube-prometheus-kube-etcd                 ClusterIP   None            <none>        2379/TCP                       143m
prometheus-kube-prometheus-kube-proxy                ClusterIP   None            <none>        10249/TCP                      143m
prometheus-kube-prometheus-kube-scheduler            ClusterIP   None            <none>        10251/TCP                      143m
prometheus-kube-prometheus-kubelet                   ClusterIP   None            <none>        10250/TCP,10255/TCP,4194/TCP   141m
prometheus-prometheus-oper-kubelet                   ClusterIP   None            <none>        10250/TCP,10255/TCP,4194/TCP   8d
traefik                                              ClusterIP   10.103.201.54   <none>        80/TCP,8080/TCP                2d3h
traefik-web-ui                                       ClusterIP   10.100.129.98   <none>        80/TCP                         2d

Then port-forward.

[minikube@control-plane helpers]$ kubectl port-forward svc/traefik 8080 -n kube-system
Forwarding from 127.0.0.1:8080 -> 8080
Forwarding from [::1]:8080 -> 8080

And a curl adding /metrics as that should be busy now.

[minikube@control-plane kube-system]$ curl 127.0.0.1:8080/metrics | head
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  7643  100  7643    0     0   # HELP go_gc_duration_seconds A summary of the GC invocation durations.
1# TYPE go_gc_duration_seconds summary
35go_gc_duration_seconds{quantile="0"} 1.0659e-05
kgo_gc_duration_seconds{quantile="0.25"} 3.1354e-05
  go_gc_duration_seconds{quantile="0.5"} 8.6276e-05
  go_gc_duration_seconds{quantile="0.75"} 0.000156634
  go_gc_duration_seconds{quantile="1"} 0.073474265
0go_gc_duration_seconds_sum 0.19274691
 go_gc_duration_seconds_count 258
--# HELP go_goroutines Number of goroutines that currently exist.
100  7643  100  7643    0     0   135k      0 --:--:-- --:--:-- --:--:--  133k
(23) Failed writing body

Great.

Checking current deployment and creating a Service

But what does that Service look like? Will it be configured correctly for the ServiceMonitor resource we’re going to create? Let’s take a look. For this to work, I’m going to take a top down approach. Let’s see how the current ServiceMonitor resources in this deployment should be confgured.

$ kubectl get prometheus -o yaml -n monitoring
...
    serviceMonitorNamespaceSelector: {}
    serviceMonitorSelector:
      matchLabels:
        release: prometheus
...

At the bottom of the trace we see something helpful. We’ve found the ServiceMonitor configuration for this current deployment. NamespaceSelector:{} okay, this means ServiceMonitor resources in all namespaces can be selected. But the next block includes another condition, each ServiceMonitor resource must have the label release: prometheus. Fine.

Let’s create a Service for our Traefik ingress controller. Note from above, we know the path for our metrics endpoint is 8080/metrics.

kind: Service
apiVersion: v1
metadata:
  name: traefik
  namespace: kube-system
  labels:
    app: traefik
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/path: /metrics
    prometheus.io/port: "80"
spec:
  selector:
     app: traefik
  ports:
    - name: app
      port: 80
      targetPort: 80
    - name: dashboard
      port: 8080
      targetPort: 8080
  type: ClusterIP

The prometheus.io/port should be switched to 8080 I reckon, and if we wanted to we could rename that port to ‘metrics’ instead of ‘dashboard’ - but I don’t think that actually makes great sense, as there is a Traefik dashboard.

Creating a ServiceMonitor resource

Here’s one I made earlier. We know our Traefik service works, so just have to get this right and we’re there.

[minikube@control-plane kube-system]$ cat traefik-svc-monitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: traefik
  namespace: monitoring
  labels:
    app: traefik
    release: prometheus
spec:
  endpoints:
  - path: /metrics
    port: dashboard
    interval: 15s
  namespaceSelector:
    matchNames:
    - kube-system
  selector:
    matchLabels:
      app: traefik

Note the label, release: prometheus this was the criteria that had to be met for this Prometheus build. Hope it looks pretty straightforward. Ok, let’s apply it. Remember, this has to be applied in the same namespace as the Prometheus pod.

$ kubectl apply -f traefik-svc-monitor.yaml -n monitoring

And shall we check it exists? (Yes..)

[minikube@control-plane kube-system]$ kubectl get servicemonitor -n monitoring
NAME                                                 AGE
prometheus-kube-prometheus-apiserver                 168m
prometheus-kube-prometheus-coredns                   168m
prometheus-kube-prometheus-grafana                   168m
prometheus-kube-prometheus-kube-controller-manager   168m
prometheus-kube-prometheus-kube-etcd                 168m
prometheus-kube-prometheus-kube-proxy                168m
prometheus-kube-prometheus-kube-scheduler            168m
prometheus-kube-prometheus-kube-state-metrics        168m
prometheus-kube-prometheus-kubelet                   168m
prometheus-kube-prometheus-node-exporter             168m
prometheus-kube-prometheus-operator                  168m
prometheus-kube-prometheus-prometheus                168m
traefik                                              105m

Here gets a little tricky if there are problems. There isn’t much visibility for errors. With this build, there are no logs for new ServiceMonitor resources or issues in the Prometheus pod. The best solution I’ve found is simply visiting the Prometheus dashboard and looking at the Targets section.

Let’s expose the Prometheus dashboard with port-forward and see if anything’s happened.

[minikube@control-plane kube-system]$ kubectl port-forward svc/prometheus-kube-prometheus-prometheus 9090 -n monitoring
Forwarding from 127.0.0.1:9090 -> 9090
Forwarding from [::1]:9090 -> 9090

Now let’s go to the browser on the server and look at Targets. Voila. Looks good!


Thanks. This was written by Nick Otter.