Operating Secure Kubernetes Platforms: Setting the Scope

Elastisys operates security-hardened Kubernetes platforms on EU cloud infrastructure. Doing so has taught us many lessons, and we are giving them away to you, so you can learn how to operate a Kubernetes platform yourself.

This is the first blog post in a series on operating a secure Kubernetes platform. The whole series is found via this tag on our blog. The topic is about setting the scope.

Simply put, the scope is a mental fence: Everything inside that fence is the Kubernetes platform, and is the responsibility of the platform team. Everything outside that fence is “not the Kubernetes platform”.

Read the entire blog post to see why the scope is important, an example of how it’s used in practice, why it matters, and its greater context.

Why is Scope Important?

  • It avoids blame games. How often do you hear the frontend folks saying "it’s a backend issue" only to hear the backend folks reply "it’s a frontend issue"? Setting a clear scope for the Kubernetes Platform not only avoids blame games. It holds the platform team accountable for the platform, while allowing them to escalate issues outside their accountability to adjacent teams, for example, the application team, infrastructure team or networking team.
  • It empowers the platform team. Knowing exactly what you "own" also gives you tremendous freedom to make the right decisions within your area of responsibility. For example, instead of holding an "all-hands" meeting to decide if running containers as root is allowed, the platform team can assess that the risk is too high and that application containers must run as non-root.
  • It makes your auditor happy. If you aim for an SOC-2 or ISO 27001 certification, you will need to talk to an auditor. Although the auditors (generally) can’t tell how to properly configure Kubernetes RBACs, they will quickly identify if you rely on too few people to know too many things. That’s a business risk.
  • It assigns the right skills to the right task. While we all wish we could hire mythical DevOps creatures that can do everything from design, to frontend development, to backend development, to database administration, and Kubernetes administration, that is simply not realistic. Remember the platform team not only needs to know Kubernetes. They need to have "muscle memory" and be able to solve any stability- or security-related issues even at 2 AM – both tired and stressed.

Unreasonable expectations on DevOps engineers

A Real-Life Scope Example

So what is the scope of the Kubernetes platform? Let us inspire you with a known-to-work scope that has been demonstrated to work.

In Elastisys Compliant Kubernetes, we set the scope of the platform as follows:

  • the Kubernetes cluster itself;
  • all Pods needed to make the Kubernetes cluster secure, observable and stable;
  • continuous delivery (CD) tool;
  • additional platform services, such as databases, message queues and in-memory caches.

Outside the scope of the Kubernetes Platform are:

  • application code and Pods;
  • continuous delivery (CD) code;
  • continuous integration (CI) code and tool;
  • identity provider (IdP);
  • infrastructure, e.g., the OpenStack or VMware vSphere installation needed to instantiate VMs, load-balancers or object storage buckets.

A common struggle that we see is demarcation around CI/CD, especially for teams that haven’t worked extensively with containerization. In particular, some organizations are struggling to understand the difference between CI/CD tooling – i.e., the generic software solution – and the application-specific CI/CD code.

The most common (working!) division of responsibilities we see is the following:

  • The IT systems team is responsible for setting up and maintaining the CI tool, e.g., GitLab CI or GitHub Actions.
  • The platform team is responsible for setting up and maintaining the CD tool, e.g., ArgoCD.
  • The application team is responsible for:
    • maintaining the CI code, i.e., building container images from source code.
    • maintaining the CD code, i.e., pushing container images into production.

Don’t Neglect Things Outside the Scope…

Although many things are outside the scope of the Kubernetes platform, they should still be audited regularly to ensure they don’t act as the weak link which may compromise security. In our experience, many platform teams struggle because they end up securing the Kubernetes platform from the application team or against the cloud provider, instead of working with these teams to achieve a common security goal. These are questions that you may need to ask:

  • Application audit: Is the application written in a way which ensures it can run securely and stably on a Kubernetes platform? Make sure that all application team members are aware of the Principles for Designing and Deploying Scalable Applications on Kubernetes.
  • Provider audit: Is the underlying cloud infrastructure offering sufficient protection? To answer this question, we recommend conducting a provider audit on a yearly basis, focusing on:
    • technical aspects, e.g., does the provider offer encryption at rest;
    • organizational aspects, e.g., does the provider follow an Information Security Management System (ISMS), such as ISO 27001;
    • legal aspects, e.g., is the provider subject to a jurisdiction with insufficient data protection.

…and the Organization Around your Kubernetes Platform

Don’t forget that a Kubernetes platform is only as secure as the organization around it. It is important to set the right "tone at the top" (yes, I mean the CEO). Security must be the number one, two and three priority of the company. Make sure you recruit and retain people that are taking security seriously.

Finally, let’s not forget "laptop hygiene". For example, full-disk encryption can help secure Kubernetes clusters against an admin losing their laptop. Within 15 years of my career, it happened to me only once, and I was glad the laptop was well protected so production Kubernetes clusters were not at risk.

Now that we set the scope, let us dive deeper into how to enable the team to take ownership of the Kubernetes platform.

Further Reading