Monitoring Kubernetes cluster utilization and capacity (the poor man's way)

If you're running Kubernetes clusters at scale, it pays to have good monitoring in place. Typical tools I use in production like Prometheus and Alertmanager are extremely useful in monitoring critical metrics, like "is my cluster almost out of CPU or Memory?"

But I also have a number of smaller clusters—some of them like my Raspberry Pi Dramble have very little in the way of resources available for hosting monitoring internally. But I still want to be able to say, at any given moment, "how much CPU or RAM is available inside the cluster? Can I fit more Pods in the cluster?"

So without further ado, I'm now using the following script, which is slightly adapted from a script found in the Kubernetes issue Need simple kubectl command to see cluster resource usage:

Usage is pretty easy, just make sure you have your kubeconfig configured so kubectl commands are working on the cluster, then run:

$ ./k8s-resources.sh
hostname1: 23% CPU, 16% memory
hostname2: 26% CPU, 16% memory
hostname3: 38% CPU, 22% memory
hostname4: 98% CPU, 66% memory
hostname5: 29% CPU, 18% memory
hostname6: 28% CPU, 16% memory
Average usage: 40% CPU, 25% memory.

If I get some time I might make a few more modifications to allow more detailed stats. Also, there are a dozen or so other scripts and utilities you can run to get more detailed stats. But for my purposes, I am quite often setting up a small cluster, running a number of apps on it, then checking what kind of resource allocation pattern I'm getting. This helps tremendously in finding the optimal instance type on AWS, or whether I need more instances or could live with fewer.

Someday hopefully kubectl/Kubernetes will include some way of finding this information more simply. But for now, there's scripts like the above one!

Comments

Awesome! Thanks!
It's a shame there is no standard dashboard to see these things.
I've extended you script to print the absolute values too:

#!/bin/bash
#
# Monitor overall Kubernetes cluster utilization and capacity.
#
# Original source:
# https://github.com/kubernetes/kubernetes/issues/17512#issuecomment-367212930
#
# Tested with:
#   - AWS EKS v1.11.5
#
# Does not require any other dependencies to be installed in the cluster.

set -e

KUBECTL="kubectl"
NODES=$($KUBECTL get nodes --no-headers -o custom-columns=NAME:.metadata.name)

unitconvert(){
  sed '
      s/\([0-9][0-9]*\(\.[0-9]\+\)\?\)K/\1*1000/g;
      s/\([0-9][0-9]*\(\.[0-9]\+\)\?\)M/\1*1000000/g;
      s/\([0-9][0-9]*\(\.[0-9]\+\)\?\)G/\1*1000000000/g;
      s/\([0-9][0-9]*\(\.[0-9]\+\)\?\)T/\1*1000000000000/g;
      s/\([0-9][0-9]*\(\.[0-9]\+\)\?\)P/\1*1000000000000000/g;
      s/\([0-9][0-9]*\(\.[0-9]\+\)\?\)E/\1*1000000000000000000/g
  ' </dev/stdin | bc | sed 's/\..*$//' # Final sed to remove decimal point
}

function usage() {
  local node_count=0
  local total_percent_cpu=0
  local total_percent_mem=0
  local total_abs_cpu=0
  local totla_abs_mem=0
  local readonly nodes=$@

  for n in $nodes; do
    local requests=$($KUBECTL describe node $n | grep -A3 -E "\\s\sRequests" | tail -n2)
    # echo "$requests"
    local abs_cpu=$(echo $requests | awk -F "[()% im]*" '{print $2}')
    local percent_cpu=$(echo $requests | awk -F "[()%]" '{print $2}')
    local abs_mem=$(echo $requests | awk -F "[()% i]*" '{print $7}' | unitconvert)
    local percent_mem=$(echo $requests | awk -F "[()%]" '{print $8}')
    echo "$n: ${abs_cpu}m ${percent_cpu}% CPU, $((abs_mem / 1000000))Mi ${percent_mem}% memory"

    node_count=$((node_count + 1))
    total_percent_cpu=$((total_percent_cpu + percent_cpu))
    total_percent_mem=$((total_percent_mem + percent_mem))
    total_abs_cpu=$((total_abs_cpu + abs_cpu))
    total_abs_mem=$((total_abs_mem + abs_mem))
  done

  local readonly avg_percent_cpu=$((total_percent_cpu / node_count))
  local readonly avg_percent_mem=$((total_percent_mem / node_count))
  local readonly avg_abs_cpu=$((total_abs_cpu / node_count))
  local readonly avg_abs_mem=$((total_abs_mem / node_count))

  echo "Average usage: ${avg_abs_cpu}m ${avg_percent_cpu}% CPU, $((avg_abs_mem / 1000000))Mi ${avg_percent_mem}% memory."
}

usage $NODES