Kubernetes is all about automation. And it can do a considerably better job the more information it has about your application. A key source of such information comes in the form of probes. This article focuses on the Liveness and Readiness probes: what they do, when to use them, and how they ensure stability and performance of your application.
This article goes much deeper into how these probes are used correctly than the official documentation for Kubernetes does. What it does not do, however, is cover the syntax of the probes. That is much better left to the documentation. The probe semantics discussed herein are equally valid for all types of probes (exec, HTTP, TCP, and gRPC).
What is the Readiness Probe?
The Readiness probe signals readiness to process requests.
In detail, this means that:
- While it responds successfully, it says the container is “ready” to receive requests.
- If all containers in a Pod are “Ready”, the Pod is added to the list of possible Endpoints for a Service.
- Network requests to a Service will go to the Endpoints (Pods that are “Ready”).
But what about when a container does not respond successfully to the probe? 🤔
- The container gets marked as not ready.
- Because at least one container is not ready, the Pod as a whole is not ready.
- The Pod is removed from the list of eligible Endpoints for any Service it belongs to, and thus, stops getting new requests.
But here’s the important part:
The probing still goes on! As soon as enough successful probes mark the failing container(s) as ready, the Pod will become an eligible Endpoint to its Service again!
This means that the Readiness probe is great for when your application component has temporary failures that simply mean it is unable to process new requests at this time.
Correct Usage of the Readiness Probe
Say, a backend component depends on a database. If the database connection goes down (for whatever reason), the backend component cannot process requests, right now.
It would be poor for application stability if the component would get requests it cannot process. It is much better to not get them in the first place, than to get them, and not be able to process them.
If the backend component now responds with failure to its Readiness probe, it will not have to get new requests until the database connection is restored.
By just not getting new requests, it can still have a warm cache, and JIT compiled interpreted languages such as Python or NodeJS will still be running efficiently. So performance is preserved, and the component will be ready to handle requests again, as soon as it is able to.
Download the open source Kubernetes distribution.
All security features pre-configured from day one with Elastisys Compliant Kubernetes. Including RBAC, monitoring, logging, intrusion detection and more.
What is the Liveness Probe?
The Liveness probe signals that the component has “liveness”. That does not mean that it is “alive”. This unfortunate name is the number one reason it is misunderstood.
“Liveness” is a term in distributed systems. It means that the component will, eventually, be able to produce a result. Or, as the Wikipedia article on liveness says, “something good will eventually occur”.
When would that ever not be the case?
Deadlock vs. Liveness
In most systems, the practical opposite of liveness is if the component is in a deadlock state. This is a state where the application is caught in such a horrendous state that it cannot continue processing (some or all requests), because one part of it is waiting for another, which is waiting for the first (or a longer chain of waiting-for relationships).
So because of deadlock, the component is unable to process (at least some) requests. In this case, it does not have “liveness”.
The only remedy to a component without “liveness” is to terminate the application, and hope that it won’t wind up in that state again.
…and that’s exactly what the kubelet in Kubernetes will wind up doing for you. It will restart that container.
Liveness Probe in Detail
In detail, when a Liveness probe fails (as many times as the failure threshold is set to), the following happens:
- the kubelet restarts that container (if the restart policy of the Pod permits),
- which marks the Pod as being “not ready”,
- which means that it it will not be an eligible Endpoint to any Service,
- which means that it stops getting requests.
This state persists until:
- all containers (including the one that was restarted) are marked as ready,
- at which point the Pod itself is ready, and
- can be listed as an eligible Endpoint for any Service it belongs to.
Correct Usage of the Liveness Probe
Restarting a container is a desperate measure to solve a very tricky situation.
But do note that it is potentially disastrous for an application to be restarted, out of the blue. Stability problems due to data loss can occur, and performance is all but guaranteed to suffer.
Did the component manage to write files to disk? Will it need to catch up to other clustered components, or read tons of data from disk into its memory?
It for sure loses all cached objects. And it will also lose any JIT compiled objects!
Therefore, stability and performance are very much impacted by forcefully restarting a container.
Therefore, unless you’re deploying a complicated database with concurrency control or a poorly written application with race conditions in it, it is highly doubtful you actually wanted a Liveness probe.
Incorrect Usage of the Liveness and Readiness Probes
A fundamental problem is incorrectly using the Liveness probe in the first place.
But there is another problem that is very widespread in the Kubernetes community: setting both probes to equal configuration.
Likely, the cause is simply confusion regarding the difference between them. Hopefully, this article has cleared that up.
But what will happen if both are set to the same configuration? Technically, they would both fail at roughly the same time. But the effect of the Liveness probe is much harsher, in that it forces a container restart. So that means it completely shadows the effects of the failing Readiness probe.
Forcing needless restarts is disruptive, and very poor for application stability and performance. As a community, let’s stop doing that. And hopefully, this article can help in this regard.