Managing the Security of Kubernetes Container Workloads

• Dec 19, 2018

In this series of articles entitled Securing Kubernetes for Cloud Native Applications, we’ve discussed aspects of security for each of the layers that make up a Kubernetes cluster. In this last article, we’re going to address the security of the container workload itself. How do we ensure the integrity of the contents of the container, and how do we know that what’s inside the container is actually what we think it is? Let’s start with secrets.

Protecting Secrets

Often, cloud native applications running on a Kubernetes cluster need access to sensitive information, and are required to provide a ‘secret’ in response to a challenge from the system hosting that information. The secret could be anything, but is typically a password, an X.509 certificate, an SSH key, or an OAuth token. We might be tempted to take the easy route and store the secret in a container’s image, so that it’s readily available to the application when it needs it. If we do this, however, it’s likely to end in tears. Leaving aside the brittle nature of this approach, more importantly, the potential for exposing the secret to entities that it shouldn’t be exposed to would be significant. Image definitions often need to be made available to a wide audience, and are frequently managed by source code control systems, which could potentially lead to inadvertent exposure of the secret. It is a widely accepted maxim that secrets should always be kept out of container images. This also adheres to the Twelve-Factor App methodology of keeping configuration outside of the container, allowing us to employ different secrets for different target environments, while using the same container image for each environment.

This begs the question: how do we make secrets available to containerized applications running in a Kubernetes cluster? Because of their sensitive nature, Kubernetes provides a dedicated API resource object for handling secrets, called (unsurprisingly) a Secret. Secret data encapsulated in Secret objects can then be referenced in the spec of Pods that need access to them - preferably by way of a volume mount, rather than by way of environment variables.

It’s important to ensure that access to a Secret is strictly limited to those entities (users and Pods) that require its use, that it’s not transmitted unencrypted over a network, and that it’s not accessible or readable when stored on disk (at rest). In previous articles we saw how to control access to API objects (such as Secrets) using authentication and authorization (RBAC), and the need to encrypt communication between cluster components using TLS. This satisfies two of these tenets, including the scenario when Node Authorization is configured and kubelets have access to secrets via the API, but what of the need to ensure that secrets aren’t accessible or readable at rest?

Encryption at Rest

Like all API objects, Secrets are stored in Kubernetes’ distributed state database etcd, simply encoded with base64. Effectively, this makes secrets available to anyone who has access to etcd, and therefore access to etcd should be carefully controlled, and communication between etcd instances should be encrypted using TLS. At rest, however, when etcd writes its database to disk, secrets are stored unencrypted, vulnerable to anyone with unauthorized access to the host filesystem.

Fortunately, it’s possible to configure the API server to encrypt Secret objects (or any other resource object, for that matter), using the --experimental-encryption-provider-config parameter, which specifies the location of a config file that determines how the objects are to be encrypted. In essence, objects are encrypted / decrypted by the API server using keys that are either provided in the config file itself (usually AES-CBC encryption), or by an out-of-process Key Management Service (KMS) provider, such as Azure Key Vault or AWS KMS. If you use a local key defined in the config file, it’s super important to ensure that the file has appropriately restrictive file permissions.

Using an External Secret Store

Using the Kubernetes Secrets API and the mechanism for encrypting secrets at rest might be enough to satisfy your organization’s attitude to risk, but there are more secure solutions available if your requirement exceeds that offered by Kubernetes.

One solution that focuses entirely on the management of secrets, including their creation, storage, revocation, rotation, lifetime (lease), and scope, is Hashicorp’s Vault. Vault requires a client to authenticate against one of its authentication methods, before issuing a token that the client can use to access stored secrets. The token contains policy that defines what the client can and can’t access.

For Kubernetes, Vault has a specific authentication method, which relies on a token associated with a Pod’s Service Account. When a Pod attempts to authenticate with Vault, Vault accesses the API server’s TokenReview API, in order to validate the token. Once authenticated, Vault issues a token to the Pod that contains the relevant scope for the service account. This enables the pod to securely retrieve secrets for the duration of the token’s lease.

Image Content

We know that storing secrets in images is taboo, but what else should we be careful about including in container images? This is a topic that attracts many different opinions, but one inescapable fact is that the more you add to an image, the more opportunity there is to exploit a container derived from the image.

Minimal Images

In an ideal world, we could simply create an image using an application binary and any associated dependencies that the binary relies on. In fact, there is nothing to stop us from omitting a base image (the image on top of which we build our own image) by using scratch as the argument for the FROM Dockerfile instruction, and copying a statically-linked binary into the image. There may be no other dependencies, in which case the image would comprise a single file, and some metadata describing how the container runs. This is great for image distribution speed (registry push/pull), and significantly reduces the attack surface inside a derived container.

Base Images

It may not always be possible or practical to do this, however, in which case we need to be wary about our choice of base image. The best approach is to curate your own base images, because you’re not relying on images authored by a third-party - if you’ve authored the image, you know exactly what’s in it. Curating your own images, however, comes at a cost - it’s a process that requires considerable effort in terms of content maintenance, which may make it prohibitive, and when done badly, could render your images less secure.

The next best approach, then, is to use images that are either supported by operating system vendors, or to use the library of official images located on the Docker Hub registry, which are curated by the community. If your organization uses a subscription-based distribution, such as RHEL or SLES, then it might be appropriate to go with what you trust, take advantage of the support provided, and use images provided by these vendors.

Never blindly use images from untrusted sources that you haven’t previously vetted, particularly in a production environment.

Image Scanning

Wherever you obtain your images from, ultimately you will be relying on the integrity of the content of that image. We can never guarantee that the software we create or use is ever free of vulnerabilities, however, and this applies to the content of the images we use. For example, the Shellshock bug in the Bash binary lay hidden for 25 years before its discovery in 2014, and many Docker images automatically include the Bash binary! This means that we need to be aware of the vulnerabilities that may exist in our existing images, even those that have already been used to derive running containers in a Kubernetes cluster.

Detecting these vulnerabilities in images is a non-trivial problem, but there are a number of tools that have surfaced in response to the problem. We mentioned one well-respected open source example in a previous article, Clair, which is a static vulnerability analyzer for container images. There are numerous alternatives, both from the open source world, as well as commercial solutions, such as JFrog Xray. One interesting free-to-use tool for image scanning is Aqua Security’s MicroScanner, which allows you to scan your image while it’s being built.

Image scanning is essential for keeping your container workloads secure, and it will pay off to ensure that your images are regularly scanned by a reputable tool, so take the time to evaluate the best solution for your needs.

Supply Chain Assurance

It would be even better if we were able to assure the integrity of the supply chain of components that go to make up our container images, before we get to building them. It’ll never take away the need to periodically and consistently scan our images for vulnerabilities, but the earlier we can detect issues, the less likely they are to slip through the net into an image for a production workload.

Snyk helps to identify and fix vulnerabilities in application dependencies, whilst Grafeas and in-toto help to secure the complete supply chain by applying security principles and policy for CI/CD pipelines.

Image Provenance

Once we’ve determined that the images we intend to use for our container workloads are sound in content, we should have a significant degree of confidence that our workloads are secure. That confidence could be shattered, however, if we allow random images to be used for containers, or if we allow ourselves to be duped into using an image that isn’t what we think it is. For this reason, it’s in our best interests to check the provenance of our images, by forcefully controlling which images can be used, from which specific sources, and to check that the image is what it appears to be. That’s another difficult problem to solve.

Imposing Image Policy

One means of ensuring provenance is to define policy for the use of images that define a Pod’s containers. For example, we might want to exclusively reference images by their digest, instead of by tag, which would ensure we get to use a very specific image version. Tags are mutable, which means that over a period of time it’s possible for an image tag to represent completely different image content. The digest of an image, however, is unique to its content, which means we can be sure we are addressing an image of known content. Another example of policy might be that we want to ensure that we only use images that are stored in a registry that is located within the confines of our organization’s firewall. Whatever the requirement, it’s important to be able to impose our defined policy, so that we don’t end up using images we shouldn’t be using.

Kubernetes has an in-built ImagePolicyWebhook admission controller for this purpose, which relies on an external backend to authorize the use of the images defined for a Pod’s containers. When configured appropriately - before a pod is admitted to the cluster - the backend is sent an ImageReview object containing the details of the container images, which it allows or disallows based on the policy it’s configured with.

Despite the mechanism for image policy being readily available in Kubernetes, there are precious few backend implementations, and it’s a largely an unused feature. Instead, more general purpose dynamic admission solutions tend to get used, where image policy can be enforced, alongside policy related to other aspects of the pod’s configuration. One increasingly popular technique for defining dynamic admission policy is to make use of the Open Policy Agent (OPA), a Cloud Native Computing Foundation (CNCF) project that implements a general purpose policy engine. OPA works by evaluating policy defined in its own language called Rego - in the context of data provided by the authorizing client (Kubernetes, in this case) - and then returns a binary allow/disallow result based on the evaluated policy. An example of an OPA-based admission controller is the kubernetes-policy-controller.

Trusting Images

Whilst we may be able to ensure we only download images from a trusted location using policy, we can’t categorically say that what we download is what we think it is. We may trust the author of a container image, and based on that trust, we may want to use the image for a Pod’s container. However, we don’t have the means of ensuring that the author’s image hasn’t been tampered with, between the point that they initiate a push of that image to a registry, to the point when we complete a pull of that image from the same registry. This is a generally recognized problem, when considering how to update software securely.

The Update Framework (TUF) is a specification, system, and project hosted by the CNCF, for securing the distribution of packaged software updates. The TUF specification describes a system that allows publishers to digitally sign their packaged content, so that consumers can verify the integrity and the origin of that same content. Notary - another CNCF project - is an open source implementation of TUF, and allows us to establish this provenance link for our container images.

The Docker engine can be configured to push images to, and pull images from, a registry that has a supporting Notary server and signer deployed (Docker Content Trust), but there is no abstracted user interface for Kubernetes, which makes it difficult to use. It’s also binary in nature, either trust is required for all images, or not. Portieris is an open source Kubernetes admission controller that allows for a more fine-grained approach when implementing image content trust with Notary in a Kubernetes cluster. Policy is defined in ImagePolicy or ClusterImagePolicy objects, which allows content trust to be enabled for specific image repositories, including the requirement that the image should be signed by a particular trusted signer. This provides much greater flexibility when applying content trust when using container images in a Kubernetes cluster.

Automation

Implicitly, in this article we’ve been discussing the need to consider security throughout the entire workflow, and not just at the point of deployment. Let’s explicitly state that security needs to ‘shift-left’, and be as much a part of the CI/CD pipeline as building, testing and observability. It needs to be baked in rather than being an afterthought, and is as much about process and culture as it is about tools.

Summary

This is the final article in our series on Securing Kubernetes for Cloud Native Applications, and we’ve covered a lot on the way. We’ve seen that security needs to be carefully considered and applied at all layers in the stack that comprises a Kubernetes cluster, which not only ensures that we cover every aspect of security, but also gives us the redundancy we need for ‘defense in depth’. We’ve also seen that judiciously applying best-practice security controls enables us to benefit from the adoption of the ‘principle of least privilege’ when it comes to managing access to sensitive resources. Finally, we’ve discussed how security needs to pervade our entire workflow, rather than it being a last-step activity, immediately prior to deployment.

Security is hard, but not impossible, and it deserves the same level of attention as the other factors that go into making a first-class, cloud-native application. Be sure to invest in the skills necessary to do it justice, or partner with an organization that’s already made that investment and gained the practical insight that comes from running Kubernetes clusters in production at scale.