NSA and CISA Kubernetes Security Guidance: Summarized and Explained

NSA and CISA Kubernetes Security Guidance: Summarized and Explained

Kubernetes is now the most popular container orchestration platform. Practically gone are the Mesoses and Docker Swarms of the world, and honestly, I’m not going to miss them. But the downside to its dominant market position is that Kubernetes is also heavily targeted by bad actors that want to compromise its security. The sad fact is that container security is in an abysmal state, with 56% of developers currently not even scanning their containers. And that is in spite of Gartner projecting that more than 70% of companies will be running containerized applications by 2023.

We, as a community, need to do something about this.

The 59-page technical report “Kubernetes Security Guidance” (direct PDF link) published on August 3, 2021 by the NSA and CISA is here to help! It is a very nice document for organizations that rely on Kubernetes as a container platform. It provides both detailed information and hands-on examples of how to secure the platform. But, the problem I see with it is that the gap between the executive summary and the highly detailed dense information in the 59 pages is significant.

To address this, I will here (a) summarize the tech report’s main takeaway messages, and (b) provide additional insights, based on my personal experience in cloud security. I’ve been working in cloud computing since 2008 in both industry and academia, and have seen the entire evolution of this field. It is my intention that this article is an informative read for decision makers, and that it provides actionable recommendations to be implemented by a team of experts.

Scan containers and Pods for vulnerabilities or misconfigurations

Why we love containers is that the images are an immutable package of a piece of software and all its dependencies. Immutability is an asset in the sense that the very same container can be subjected to quality assurance processes and get promoted from development to production without any change at all. But it is also a liability, because container images are software time capsules: they do not automatically get updates as new vulnerabilities are discovered.

Scanning container images for known vulnerabilities is a security best practice (although only done by only 44% of developers). But most just scan images when they are initially pushed to the registry. And this creates a problem. Because the more stable the application, the less frequently it gets updated, and thus pushed to the registry.

Ironically, stability makes container images more likely to become vulnerable between updates.

As a mitigation, the NSA/CISA report recommends using a Kubernetes Admission Controller, which will request a scan upon Pod deployment. But when you think about it, this suffers from the same problem: if an infrequently updated application is deployed for a long time, this additional deploy-time check will not suitably protect the long-running application.

That is why I strongly recommend that a process to regularly determine which containers are deployed to your cluster is put in place, and that those images are scanned regularly. Just schedule to loop over the Pods once a day and have the registry scan the container images in them. This way, your scan results are up to date and accurate.

Run containers and Pods with the least privileges possible

Kubernetes and container runtimes have been very lax with their default security posture since day one. And in a world where 2/3 of insider threats are caused by negligence, letting software or users have too broad permissions is by definition negligent!

The default user in containers is the system administrator “root” user. You have to manually opt out of that. Kubernetes imposes little to no additional restrictions either on what the containerized application can do. Thus, if a cyber attack against an application in a Kubernetes container platform is successful, the set of privileges and permissions that the actor is granted is very broad.

To mitigate the risks, policies should be put in place to ensure and enforce that:

  • containers do not run as the “root” user (and, if possible, that the container runtime itself also does not - the default one does), so as to limit the permissions of the application, and hence bad actor in case of hacking;

  • container file systems are immutable, to prevent a bad actor from erasing tracks of their attack;

  • the most restrictive Pod Security Policy (Kubernetes up to v1.21) or Pod Security Standard (Kubernetes v1.22+) is in use to, e.g., run as a non-root user and disallow privilege escalation to essentially become the root user and access to the container host OS; and that

  • default Service Account tokens are not needlessly made accessible to Pods, because they might give far more access to the Kubernetes cluster’s API than you intended. Your application probably never even needs this, so why is it there by default?

I think it’s a no-brainer that these baseline policies should always be in place. But you probably have more policies, too. Ones that are particular to your organization. And for this purpose, I whole-heartedly recommend the use of configuration available in Kubernetes in addition to Open Policy Agent (OPA). With configuration inspired by the official library or third-party policies, it can enforce all matters of policies, upon each request.

Use network separation to control the amount of damage a compromise can cause

The default networking settings in Kubernetes allow Pods to freely connect to each other, regardless of the namespace they are deployed in. This free-for-all approach to networking means that a bad actor only needs to get into a single Pod to have unfettered access to the others. So the entire platform is only as secure as the least secure component, and all a bad actor has to do is get in via your least secure component. And then, the rest is history.

Kubernetes Network Policies impose configurable limitations to networking. How these are implemented differ depending on the Container Networking Interface (CNI) provider used, but essentially turn into Kubernetes resource aware firewall rules. This makes it easy to specify that only “the backend component” gets to call “the database”, and nothing else. So then, a weakness in your API gateway doesn’t mean that an attack can easily be launched against any component in your platform anymore.

Use firewalls to limit unneeded network connectivity and encryption to protect confidentiality

Kubernetes container platforms consist of a control plane and a set of worker nodes. The control plane nodes host components that control the entire cluster. So a bad actor that manages to control the control plane can therefore make arbitrary follow-up attacks and command the cluster fully to do their bidding.

A network perimeter defence via firewalls can help mitigate against this type of attack from (external) malicious threat actors. No component of the control plane (Kubernetes API, etcd, controller managers, …) should be more exposed than absolutely necessary to meet the organization’s needs.

Also note that network traffic within Kubernetes clusters is typically not encrypted. This means that sensitive information could be picked up and exploited by software that a bad actor has managed to place inside the container platform. To prevent this class of attacks, all traffic in the cluster can be encrypted. This is a rather trivial and fully transparent change if the cluster uses a CNI provider that provides encryption as a configuration option. For instance, Calico can do this by leveraging WireGuard. I definitely recommend doing that if you cannot trust the underlying network sufficiently for the information security demands you have.

Use strong authentication and authorization to limit user and administrator access as well as to limit the attack surface

The Kubernetes container platform has role-based access control features in its API server. However, for some reason, these must be explicitly activated. Further, typical Kubernetes installations provide a never-expiring system administrator “token” to whoever installed the cluster. Use of this token gives full and perpetual access to the cluster. Guess how I feel about that? 🤯

Although not enabled by default, Kubernetes supports authentication via various methods. There are various ones, but I strongly recommend that OpenID Connect tokens are used. You can integrate with many identity provider services, and most support emitting such tokens. They can also contain information about which group a user is in, and therefore, make it possible to set role-based access control rules on a group level. For the ones that don’t, Keycloak or Dex IdP can probably integrate with them.

And in what can hopefully (and generously) be seen as a misguided attempt to be easy to use, Kubernetes also supports anonymous requests by default. This should of course without question be turned off.

Role-based access control should be both enabled and configured to adhere to the principle of least privilege. As in, only the smallest set of privileges should be granted to both software and users, and any additional privilege requests should be reviewed upon request.

As you can tell, I truly recommend that (a) the administrator token is disabled, (b) OpenID Connect is enabled, (b) anonymous access is disabled, and (d) that role-based access control is enabled. And, (e) that you actually restrict permissions as much as possible.

Use log auditing so that administrators can monitor activity and be alerted to potential malicious activity

The Kubernetes control plane has audit logging capabilities built in. But, again (notice the theme here?), they must be explicitly enabled via configuration. Like the Kubernetes Hardening Guidance technical report, I of course also recommend enabling these, so operators can gain insight into what is happening in their cluster.

However, merely enabling a stream of very frequent logs (all automated requests against the Kubernetes API also leave an audit trail) merely provides the haystack. Finding the needles in the haystack requires actually parsing and using these logs. This can either be done via filtering expressions in your log storage solution (e.g. Opensearch, Splunk, or DataDog) or via automated and policy-aware parsing by an automated system, such as the CNCF project Falco. It can, together with a log handling service, act as an automated Security Incident and Event Management (SIEM) system. Please do that.

Periodically review all Kubernetes settings and use vulnerability scans to help ensure risks are appropriately accounted for and security patches are applied

Kubernetes releases new versions of the container platform about three times per year. Security updates are only provided for the current version and the two before it. So, to stay up to date with security, operators must install a new version at least once per year. Preferably, they would follow every new version, as what I would graciously call the rather naive security features of the past are being gradually improved upon.

As I hope I’ve made clear by now: Kubernetes is not secure by default. The amount of disabled security features shows that security is intentionally not a consideration by default. New security threats are constantly being developed. Therefore, the use of automated vulnerability scanning of the entire platform, including the control plane and the worker nodes themselves, is highly recommended. Both by the report and by me.

There’s a catch, though. Does automated testing catch everything? No. Not by a long shot. But it does catch some of the more glaring errors, which if found by bad actors, indicate that the platform is likely poorly configured in other ways, too. In my experience, not even taking care of the low-hanging fruit serves as a huge “welcome” sign.

Are automated vulnerability tools sufficient?

Many tools promise automated vulnerability scanning, both of container images, and of the configuration of the Kubernetes cluster or the resources managed within it. These provide an appealing offering, in that they will highlight misconfigurations. But they are limited in scope and functionality. They do not (and technically can not) cover everything that the NSA/CISA Kubernetes Hardening Guidance recommends.

ARMO, a security company, has released Kubescape. It claims to be the first tool for verifying a cluster against the best practices from the NSA/CISA tech report. And indeed, at the time of writing (September 7), it does contain a nice set of automated tests for parts of it.

However, it is limited in that it only uses the Kubernetes API to perform its checks. Therefore, it cannot verify, e.g., whether container image vulnerability scanning, audit log parsing, firewalls, or strict privilege limits are in place.

Aqua Security have similarly released kube-bench. Unlike Kubescape, it can check how the control plane is configured, by inspecting the running processes on a control plane host. Unfortunately, it is similarly unable to check the security features that are not part of the Kubernetes configuration.

Therefore the answer is “no”. One cannot merely run automated checks and claim to have perfect (or even a good) security posture. Actual understanding of security policies, and a broader view than merely the cluster itself, is also needed.

Closing thoughts

Kubernetes is neither secure by default, nor by itself. The NSA and CISA Kubernetes Hardening Guidance provides hands-on tips on how to configure Kubernetes to improve your overall security posture. I have summarized the main takeaways from that 59-page report and added notes based on my experience working with cloud computing for security-conscious companies in regulated industries.

As further reading, we would like to recommend my blog post on going beyond the Kubernetes Hardening Guidance tech report. It takes a much broader view of the deployed application and platform, and the security pitfalls and practices that I have personally experienced during many years of helping customers succeed in the cloud.

Lars Larsson

Lars is a Senior Cloud Architect at Elastisys. He has worked with cloud computing since 2008 and holds a PhD in Computer Science for his research in cloud computing. Together with Cristian Klein, he acts as Branch Manager for the Lund office.