Multiple stages within a Kubernetes cluster

7 min readMar 20, 2019

Use of multiple environments in software development — like dev, test, prod — is common. Every environment has different requirements: The dev-stage provides developers with all the needed permissions so they don’t need to think about other developers or users. An application on a development environment also may sometimes not work properly. The production environment, on the other hand, is for end users and has to be stable all the time and be highly available with zero downtime. Moreover, users may enter sensitive, personal or confidential data in a production environment and this data should be kept confidential. Multiple stages, therefore, are not only a best practice but are mandatory.

When you are planning to transition into the cloud, you may wonder how many Kubernetes clusters you will need. Should you install a separate cluster per stage or combine multiple stages within one cluster and save effort on installation and configuration.

This article shows how to create isolated stages within a Kubernetes cluster. Afterwards, I’ll discuss the advantages and disadvantages of multiple stages.

Namespaces

When using Kubernetes, you almost inevitably have to use namespaces to group and sort resources. The official documentation has this to say:

“Kubernetes supports multiple virtual clusters backed by the same physical cluster. These virtual clusters are called namespaces.”
— https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/

Kubernetes offers a RESTful API. Accordingly, every configuration is a resource, where it is possible to group resources into namespaces. You could imagine one namespace per environment, even though that might not be flexible enough. One namespace per combination of product or app and environment can be recommended, especially if the Kubernetes-cluster is used by multiple teams.

A namespace may contain multiple different configurations. Among other things, it is possible to define quotas. In this way the resource consumption (RAM & CPU) of all pods running in a namespace can be limited. In addition, you can also assign fewer resources to dev-namespaces than test-namespaces, for example. Furthermore, it is possible to define an object count quota to limit the amount of e.g. pods. In this way, you can restrict replica sets or autoscalers. On a side note: As long as quotas are defined on a namespace you have to define either LimitRanges settings for a namespace or every pod has to define which resources it will consume. The latter case is recommended.

Aside from quotas, you can also limit user access at the namespace level. Thanks to RBAC it is possible to define in a very detailed manner the resource and access method on a per-user basis, for both LDAP users and technical service accounts. This enables developers to get close to having admin-rights on the dev-stage and be restricted from test-stage onwards.

Isolation

Namespaces solve some, but not all problems. By default, every application within Kubernetes can access every other — even outside their own namespace. Kubernetes supports pods and services with their own DNS. Access works by specifying the service name and optionally the namespace. When dividing into stages, however, this is generally not desirable. Apps on the dev-stage should not be able to call apps on the test-stage and vice versa.

Kubernetes offers a solution with NetworkPolicies. To understand NetworkPolicies, three rules are important:

No rule means access allowed, as already mentioned
Access is denied if there are rules for the target pod, but the source pod does not appear in the ingress part of this rule
Access is denied if there are rules for the source pod, but the target pod does not occur in the egress part of this rule

In summary, this means: Without NetworkPolicies access is allowed. As soon as there is a rule, only pods in the list are permitted access. In practice, one usually defines a NetworkPolicy on both the source and target pod to limit access.

A NetworkPolicy is bound to a namespace and consists of three parts:

Which pods are affected within a namespace.
Ingress rules concerning the incoming communication of the pods.
Egress rules concerning the outgoing communication of the pods.

Communication partners are usually selected via label selection at pod or namespace level. In our case we need a rule which (1) selects all pods within the namespace, (2) can be accessed by every pod within the namespaces decorated with the label “stage: dev”, which has to be set at the namespaces and (3) can call every pod in the other dev-namespaces. There is still a small stumbling block: In part 3 (Egress) the Kubernetes DNS port 53 must be activated separately, otherwise the URLs cannot be resolved.

An example configuration would look like this:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-dev-stage
  namespace: my-app-dev
spec:
  # 1)
  podSelector: {}# 2)
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          stage: dev# 3)  
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          stage: dev
  - ports:
    - port: 53
      protocol: "UDP"
    - port: 53
      protocol: "TCP"

An important note about NetworkPolicies: These only work if the installed network plugin supports them. Calico, for example, supports them.

After the mentioned policy is applied it is no longer possible to access any app from outside the namespace. The reason is that the connection from the ingress-controller, which resides in the kube-system-namespaces, gets blocked. Therefore it is necessary to create another policy. Ideally, it uses a combination of selectors for namespaces and pods.

Labels are very important in Kubernetes, as they can be used to group and select resources. Among other things, the deployments / pods are marked with labels, which are then merged into services. If several stages are used within a cluster, it is recommended to label each resource with a “stage” label, even if this might seem a bit exaggerated at first. If this information is needed later, it will be difficult to find and update it.

Physical separation

The previous approach solves the isolation of networks. If you are looking for a physical separation of CPU and RAM, resources can split the stages onto different worker nodes. Let’s take a look at the Kubernetes architecture first:

Kubernetes consists of a master node and several worker nodes. All requests to Kubernetes go through the API server installed on the master node. The master node monitors the worker nodes’ health. Furthermore the master node distributes the pods onto their worker-nodes. And it is precisely this logic that can be adjusted. You can instruct Kubernetes on which nodes the pods should be started.

That provides us with some interesting options. You can provide isolated worker nodes to certain named stages. This way it’s possible to provide the dev-stage with less powerful machines than the test-stage. Or you can create an explicit performance stage where only the application is installed and where its performance is repeatable and measurable with minimal external influences.

Kubernetes offers two options to distribute the Pods onto Nodes. The first one is a node selector, which will be superseded by node-affinity as soon as it is production ready. Nodes can also be decorated with labels, which means in our case that every node of a stage has to get the same label. Afterwards, the node-affinity can be defined within the pods (respectively pod-template in deployment). A bare-bones configuration could look like this:

apiVersion: apps/v1beta1
kind: Deployment
spec:
  template:
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: stage
                operator: In
                values:
                - dev

The introduced solution has a disadvantage: If you forget the affinity setting the pod will be installed on a random node. For this case there is a third option: You can replace the affinity with taints. With taints, the logic is reversed: A pod can only be installed on a node if the taint is tolerated. If every node is now provided with a taint of “stage”, a pod can no longer be deployed accidentally on a wrong node. If a deployment / pod omits the specification, it cannot be installed on any node.

Therefore every node has to be marked by kubectl taint nodes stage=:NoSchedule. Afterwards, a deployment looks like this:

apiVersion: apps/v1beta1
kind: Deployment
spec:
  template:
    spec:
      tolerations:
      - key: "stage"
        operator: "Equal"
        value: "dev"
        effect: "NoSchedule"

Only one of the solutions (affinity or taints) is necessary.

The allocation to individual nodes has advantages and disadvantages. You have a hard allocation to different machines, which can be separated by firewalls, but it takes away some of the flexibility. If, for example, resources were still available on the test stage but no longer on the dev, nothing could be installed on the dev stage.

Discussion

This article shows how to create multiple stages within a cluster using namespaces, RBAC, quotas, NetworkPolicies and NodeAffinities / taints. If you use all of them, the overview graphic looks like this:

At first glance, it can be seen that the dev and test nodes are separated from each other, but still have a common master. That is provided by the Kubernetes architecture. If you want to separate the master nodes, you have two Kubernetes clusters, which leads us to a disadvantage: If there is a problem with the master or the API-server — like it was at the end of 2018 (CVE-2018–1002105) — it may be possible to access the other stages. This means that the presented solution should only be used if the individual parties are friendly with each other. In our example, this is normally the case. However, it is not advisable to use these approaches for multi-tenancy.

It is recommended to set up a separate, isolated cluster for the prod-Stage. This is the only way to ensure that the prod-stage will not be affected by a faulty configuration in one of the previous stages. As an additional benefit, updates of the Kubernetes cluster can be tested and verified on the dev stage.

As already mentioned, the solution is error-prone if not all labels are set accurately. For example, if a label in a namespace is set incorrectly or not at all, it will be rejected by the NetworkPolicy. It is therefore also important to work carefully and to automate the processes as much as possible.

In return, the possibility of realizing several stages in a Kubernetes cluster simplifies the operation of Kubernetes because it has to operate, install, manage and maintain fewer clusters.

Originally published at jaxenter.com on March 20, 2019.

Multiple stages within a Kubernetes cluster

Namespaces

Isolation

Physical separation

Discussion

Written by Michael Frembs