Fixing '503 Service Unavailable' and 'Endpoints not available' for Traefik Ingress in Kubernetes

In a Kubernetes cluster I'm building, I was quite puzzled when setting up Ingress for one of my applications—in this case, Jenkins.

I had created a Deployment for Jenkins (in the jenkins namespace), and an associated Service, which exposed port 80 on a ClusterIP. Then I added an Ingress resource which directed the URL jenkins.example.com at the jenkins Service on port 80.

Inspecting both the Service and Ingress resource with kubectl get svc -n jenkins and kubectl get ingress -n jenkins, respectively, showed everything seemed to be configured correctly:

$ kubectl get svc -n jenkins
NAME      TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
jenkins   ClusterIP   172.20.3.104   <none>        80/TCP    17m

$ kubectl get ing -n jenkins
NAME      HOSTS                                 ADDRESS   PORTS     AGE
traefik   jenkins.example.com                                   80        17m

But when I visited the URL, I would get a 503:

$ curl -I http://jenkins.example.com/
HTTP/1.1 503 Service Unavailable
Vary: Accept-Encoding
Date: Wed, 24 Oct 2018 18:23:42 GMT
Content-Length: 19
Content-Type: text/plain; charset=utf-8

The Traefik logs weren't all that helpful (I have Traefik running as a DaemonSet), but did point to some sort of disconnect between the jenkins Service and the jenkins Deployment:

$ kubectl logs -l app=traefik -n ingress-controller
...
{"level":"warning","msg":"Endpoints not available for jenkins/jenkins","time":"2018-10-24T18:33:11Z"}
{"level":"warning","msg":"Endpoints not available for jenkins/jenkins","time":"2018-10-24T18:33:13Z"}
{"level":"warning","msg":"Endpoints not available for jenkins/jenkins","time":"2018-10-24T18:33:13Z"}

Eventually my Googling led me to this GitHub issue comment, which stated:

The likely culprit is that your Service's selector doesn't match any Pod's labels.

Sure enough, when I described the full jenkins Service, I noticed it had no associated Endpoints!

$ kubectl describe svc jenkins -n jenkins
Name:              jenkins
Namespace:         jenkins
Labels:            app=jenkins
Annotations:       <none>
Selector:          app=jenkins,tier=frontend
Type:              ClusterIP
IP:                172.20.3.104
Port:              jenkins  80/TCP
TargetPort:        8080/TCP
Endpoints:         <none>
Session Affinity:  None
Events:            <none>

I realized the Selector labels I had defined did not match the jenkins Deployment labels I had defined. I changed the labels to match by editing the Service definition (kubectl edit svc -n jenkins), and then Traefik immediately started serving the traffic, and the Endpoints value was filled in with the Jenkins pod's IP address!

Comments

SO glad I documented this when I ran into it. I ran into the exact same issue on another cluster today and was at my wit's end... googled the error from Traefik, came here, and BOOM, exact same problem :)

Maybe I should get my head checked out for making the same mistakes in multiple clusters through the years ?

Thanks for your post. I got a 503 and i was missing an app selector in the service object.
spec:
selector:
app: grafana

What a relief!!!! I don't know how many odd things I have been thru now, and this was just it. Thank you for posting!

Thank you for this article Jeff, I finally could find out how to debug my ingress controller :)

Thanks Jeff still a useful post after all these years.

For me the issue was a little different where basically turned off my raspberry pi cluster, spun it back up everything came back online but service had no endpoints, however my selectors were correct.

Could have been the apps/pods came online before Traefik could initialize (not sure), deleting the POD(s) forced the service endpoints to be updated.