Wait for it - Using Readiness Probes for Service Dependencies in Kubernetes
In a perfect world of microservices every part is aware that things can break. If some endpoint can't be reached it should just be retried after some time and repeated over and over. This connection problem will be monitored and some upper layer may decide to take further actions. The microservice does its one job and acts resiliently with regards to upcoming obstacles.
But in reality we are often dealing with applications that instantly break, when something is not the way they expect it. When connecting to an API or database that is not ready yet, the app gets upset and refuses to continue to work, thus I call this a 'stubborn app' in the remainder of this article.
We can't rewrite all these good old applications, but still want to gain benefits from microservice infrastructures and orchestration now. So we'll need to come up with some workaround and define some kind of dependency between deployments that is taken care of when they are brought up. The solution: we will just wait until a dependent API or database is ready and only start our stubborn app after that.
Know when a Pod is ready
On our old-platform we had a feature on the platform API that allowed a service to signal it is actually ready to accept requests.
After rebasing our platform onto Kubernetes, we needed to ensure that we were staying on par feature-wise. Luckily Kubernetes includes a lot of well thought out primitives. For this case we can make use of the Readiness Probe.
A Pod with defined readiness probe won't receive any traffic until a defined request can be successfully fulfilled. This probe is defined through the Kubernetes API, so no changes are needed in our microservice. We just need to setup a readiness probe for the microservices that our stubborn app is depending on. Here you can see the relevant part of the container spec you need to add (in this example we want to know when Grafana is ready):
readinessProbe: httpGet: path: /login port: 3000
✅ Now we know when our dependent API or database is ready!
Delaying the deployment of our stubborn service
We still need to take care that our stubborn app is not started before its dependency is ready. Kubernetes can run an Init Container before our actual Pod. There, we can do some preparation if needed, or in our case just wait until the status of our dependent service changes to
ready. After that it is safe to start the Pod.
In general we'll use the Service resource to access other microservices. We can ask Kubernetes how many Pods are ready to be used by a certain Service with a simple
namespace="monitoring" service="grafana" cacert="/var/run/secrets/kubernetes.io/serviceaccount/ca.crt" token="$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" curl --cacert $cacert --header "Authorization: Bearer $token" \ https://kubernetes.default.svc/api/v1/namespaces/$namespace/endpoints/$service \ | jq -r '.subsets.addresses | length'
We run a
GET request on
https://kubernetes.default.svc/api/v1/namespaces/$namespace/endpoints/$service. This is the Kubernetes API we can access from within a Pod when providing the right credentials. For convenience (in most vanilla setups) these are mounted in by default by Kubernetes at
On this request Kubernetes answers with information about all available endpoints for the service we specified in JSON format. It is enough for us to check on the length of the array with the endpoints, since there will be only endpoints listed that are
ready. So we can just loop over this and wait while length is zero. For a complete example see: grafana/import-dashboards/job.yaml#L21
✅ Now our stubborn app will be waiting until its desired environment is ready. Awesome!
With this use of a Readiness Probe in combination with a waiting Init Container, we have a simple solution to ensure that Pods with dependencies do not get started before their dependencies are ready. This also works with more than one dependency.
Depending on the use case, there are other possible solutions. We will discuss some of them and their pros and cons in a following blog post.
What do you think? Any thoughts to share?
Happy to discuss via comments \o/