Elastisys visited KubeCon / CloudNativeCon last week to meet clients, engage with the community, and do some trendspotting. The event was the largest KubeCon yet, with over 4000 participants! To all in the community who could not make it, we give you this post full of the latest trends. Read about what is happening in distributed systems and container orchestration in 2018 here!
The Cloud Native Computing Foundation (CNCF) makes a large set of tools available to developers and operations teams. Kubernetes is one of the CNCF projects, offering orchestration of distributed systems. Many companies explore the CNCF toolbox, and the new ways of building and composing software it enables. And when they do, they will invariably face a steep learning curve. Building distributed systems that can scale requires leaving familiar comfort zones behind. And until recently, the alternative was not very comfortable at all. Due to a lack of good tool support, many developers and operations teams felt left lost on uncharted waters. This made them wary of adopting these new technologies — and for good reason! Our experienced team did some trendspotting, and are happy to report that many tools are maturing. Used correctly, they can cast a much needed light, and increase confidence and productivity. Finally!
Trendspotting: Observability
Compare a single large application with a distributed system from a monitoring point of view and you will clearly see that the single application is easier to reason about. Because state and function calls are right there to be inspected by a debugger or profiler, problems are easy to find and fix. By splitting into distributed components, we make huge gains in scalability and developer agility. However, the system becomes much more difficult to inspect.
Tracing: observing communication
OpenTracing offers a vendor-neutral API for distributed tracing. By adding metadata to calls across your services, it can let you see how your services dealt with individual calls. Do some queries slow your database down? Was there a general slowdown at 5PM yesterday? When did it start? By simply adding OpenTracing support, you can then visualize your calls in e.g. Jaeger.
Monitoring: observing components
Prometheus is the time-series database of choice for Kubernetes clusters. This KubeCon confirmed that we see convergence in the community around Prometheus, both as format for representing metrics during transport and as a standard database. It offers a great query language (PromQL), that you can use to ask interesting queries. Heapster is now deprecated, and the Kubernetes SIG Instrumentation is moving full speed ahead toward Prometheus instead. Even what may seem like competitors, such as InfluxDb, embrace this change, and choose to reposition as more durable storage backend than the native one. This all means that there has never been a better time to embrace and learn Prometheus.
Missing: Logging solution!
So that covers understanding traffic in distributed systems and keeping track of events in a time series. But missing from the picture is a simple and standard logging solution. Fluentd is a CNCF project for collecting logs, but where do you send them? And where, and how, do you process them? There are big companies in this space, but no obvious community choice. What do you think, dear Reader? What do you use, and why? Tell us and our readers your insights by using the comment box below!
Trendspotting: Networking and Service Meshes
Distributed systems rely on networks to communicate. But networks are fickle, and difficult to handle correctly. Also, for a long time, developers thought of networks as that thing that lets them call services outside of their control. But with microservices, most of the services you rely on are no longer on the outside. And your entire service depends on all microservices working in unison. Therefore, it is time to use tools that offer more than the “send and pray” approach of days gone by.
Service meshes, such as Istio, add an abstraction layer between your services on the communication layer. You no longer communicate directly between your components, but indirectly via the service mesh. In return, you get a slew of great features:
- Insight, as your network calls are tracked by the service mesh.
- Security, as you can set policies such as “always encrypt traffic”, without having to deal with certificates and TLS in your services.
- Routing and Deployment Strategies, as you can switch between service versions for Blue/Green or Canary deployments via the service mesh.
- Resiliency, as you can inspect the results of calls and make smart decisions based on how the call went.
Networking, simplified
It is worth repeating that you get all these features for free by using a service mesh, and no changes are required to your services. In fact, you might even wind up deleting code, since you can throw away custom TLS-handling code. And what about resiliency? If a service instance is broken, you probably want to isolate it so it no longer serves traffic. However, you do want to keep them around for post-mortem inspection. Also, you might want to re-send requests that errored out to working instances. How would you even do this in a traditional networking setup? With load balancers and services, you are not even able to tell which instance serves your requests. But with services meshes, all this is very possible and relatively easy to configure.
Linux, as in the kernel, is also embracing the idea of a higher level of abstraction for networking. Cilium is a project to keep tabs on in this space. Its BPF will eventually replace iptables and nftables in the Linux kernel.
Trendspotting: Security
We have all heard that security needs to be in the design from the start. The reason we have all heard it, is because very few people actually do. Sadly, both the Docker and the Kubernetes project have rather shaky security stories. However, that is changing with mechanisms and policies to help make security hardening possible. Slowly, cluster installs are becoming more secure by default. But the major problem is all the legacy we deal with from the Wild West days of when both projects were young and those mechanisms were not in place.
Role-based Access Control, RBAC, is now possible to use in Kubernetes for authorization. Network Policies as well, where compatible network providers allow you to restrict incoming and outgoing Pod network traffic. Pod Security Policies defines what a Pod spec is allowed to request in its Security Context. You can use it to set if the (dangerous!) privileged mode can be set, what volumes can be mounted, and what Linux Capabilities are granted.
However, a Docker legacy problem is also that more than 85% of all Docker images run as root. And root in a container is root on the host (this is by design and well-known, but easily forgotten).
Google unveiled gVisor, its container runtime sandbox. What it does, is that it adds an additional layer of security between the container and the host machine’s kernel. If the container tries to invoke functions in the kernel it is not supposed to, gVisor stops it. This way, there is an additional layer of security (like with VMs) but with very low overhead (so not at all like with VMs). This is exciting technology that is definitely worth a look!
Some automated tools exist, such as kube-bench. But in security, checklists are useful to catch only the known errors. It is an on-going process, that is powered by nothing other than vigilance and experience. If you want help configuring your cluster by our experienced engineers, used by e.g. fintech companies, contact us today.
Trendspotting: Developer Ergonomics
Developers are experts at writing code. Lately, they have also been tasked with operations (DevOps). But their bread and butter is still centered around delivering code. Running servers is seen more as a necessary evil. But lately, the community has challenged whether it actually is necessary. What a refreshing question to ask!
The field is currently developing, but we can expect to see the rise of GitOps: fully automated, worry-free IT operations that run upon git push. Basically, developers need only finish their merge requests to have code built and running in the cloud seconds later. Combined with services meshes (see above), such code deployments can be blissfully boring (as in, nothing unexpected happens!).
While GitOps is possible to adopt for many typical services, serverless platforms are also rising quickly in popularity. Oracle Fn prepares to power globally scalable enterprise deployments, and on the self-hosted side, OpenFaas matures as well.
Conclusions
The Kubernetes and Cloud Native communities are moving forward at a rapid pace. There are exciting tools coming out, and we have listed a few here as part of our trendspotting article from KubeCon and CloudNativeCon Europe 2018. But until the tools have matured more and integration is solved, many companies are wary to use them. Because we fully understand such concerns, we are happy to offer our services to help clients see past the marketing speak and gain the all the benefits without drawbacks. Elastisys is both a Services and Technical Partner of the Kubernetes project, proving that our commitment and knowledge runs deep in these waters.
If you want to accelerate your ability to deliver software, learn how to best use a service mesh, or harden your security you are always welcome to contact us. Also, if these topics sound interesting to you, we are hiring DevOps and Data Scientists.