calico

Debugging networking issues with multi-node Kubernetes on VirtualBox

Since this is the third time I've burned more than a few hours on this particular problem, I thought I'd finally write up a blog post. Hopefully I find this post in the future, the fourth time I run into the problem.

What problem is that? Well, when I build a new Kubernetes cluster with multiple nodes in VirtualBox (usually orchestrated with Vagrant and Ansible, using my geerlingguy.kubernetes role), I get everything running. kubectl works fine, all pods (including CoreDNS, Flannel or Calico, kube-apiserver, the scheduler) report Running, and everything in the cluster seems right. But there are lots of strange networking issues.

Sometimes internal DNS queries work. Most of the time not. I can't ping other pods by their IP address. Some of the debugging I do includes: