Least privilege for Kubernetes workloads
Usually, there will be a service account (default) associated with a Kubernetes workload. Thus, processes inside a pod can communicate with kube-apiserver using the service account token. DevOps should carefully grant necessary privileges to the service account for the purpose of least privilege. We've already covered this in the previous section.
Besides accessing kube-apiserver to operate Kubernetes objects, processes in a pod can also access resources on the worker nodes and other pods/microservices in the clusters (covered in Chapter 2, Kubernetes Networking). In this section, we will talk about the possible least privilege implementation of access to system resources, network resources, and application resources.
Least privilege for accessing system resources
Recall that a microservice running inside a container or pod is nothing but a process on a worker node isolated in its own namespace. A pod or container may access different types of resources on the worker node based on the configuration. This is controlled by the security context, which can be configured both at the pod level and the container level. Configuring the pod/container security context should be on the developers' task list (with the help of security design and review), while pod security policies—the other way to limit pod/container access to system resources at the cluster level—should be on DevOps's to-do list. Let's look into the concepts of security context, PodSecurityPolicy, and resource limit control.
Security context
A security context offers a way to define privileges and access control settings for pods and containers with regard to accessing system resources. In Kubernetes, the security context at the pod level is different from that at the container level, though there are some overlapping attributes that can be configured at both levels. In general, the security context provides the following features that allow you to apply the principle of least privilege for containers and pods:
- Discretionary Access Control (DAC): This is to configure which user ID (UID) or group ID (GID) to bind to the process in the container, whether the container's root filesystem is read-only, and so on. It is highly recommended not to run your microservice as a root user (UID = 0) in containers. The security implication is that if there is an exploit and a container escapes to the host, the attacker gains the root user privileges on the host immediately.
- Security Enhanced Linux (SELinux): This is to configure the SELinux security context, which defines the level label, role label, type label, and user label for pods or containers. With the SELinux labels assigned, pods and containers may be restricted in terms of being able to access resources, especially volumes on the node.
- Privileged mode: This is to configure whether a container is running in privileged mode. The power of the process running inside the privileged container is basically the same as a root user on a node.
- Linux capabilities: This is to configure Linux capabilities for containers. Different Linux capabilities allow the process inside the container to perform different activities or access different resources on the node. For example, CAP_AUDIT_WRITE allows the process to write to the kernel auditing log, while CAP_SYS_ADMIN allows the process to perform a range of administrative operations.
- AppArmor: This is to configure the AppArmor profile for pods or containers. An AppArmor profile usually defines which Linux capabilities the process owns, which network resources and files can be accessed by the container, and so on.
- Secure Computing Mode (seccomp): This is to configure the seccomp profile for pods or containers. A seccomp profile usually defines a whitelist of system calls that are allowed to execute and/or a blacklist of system calls that will be blocked to execute inside the pod or container.
- AllowPrivilegeEscalation: This is to configure whether a process can gain more privileges than its parent process. Note that AllowPrivilegeEscalation is always true when the container is either running as privileged or has a CAP_SYS_ADMIN capability.
We will talk more about security context in Chapter 8, Securing Pods.
PodSecurityPolicy
The PodSecurityPolicy is a Kubernetes cluster-level resource that controls the attributes of pod specification relevant to security. It defines a set of rules. When pods are to be created in the Kubernetes cluster, the pods need to comply with the rules defined in the PodSecurityPolicy or they will fail to start. The PodSecurityPolicy controls or applies the following attributes:
- Allows a privileged container to be run
- Allows host-level namespaces to be used
- Allows host ports to be used
- Allows different types of volumes to be used
- Allows the host's filesystem to be accessed
- Requires a read-only root filesystem to be run for containers
- Restricts user IDs and group IDs for containers
- Restricts containers' privilege escalation
- Restricts containers' Linux capabilities
- Requires an SELinux security context to be used
- Applies seccomp and AppArmor profiles to pods
- Restricts sysctls that a pod can run
- Allows a proc mount type to be used
- Restricts an FSGroup to volumes
We will cover more about PodSecurityPolicy in Chapter 8, Securing Kubernetes Pods. A PodSecurityPolicy control is basically implemented as an admission controller. You can also create your own admission controller to apply your own authorization policy for your workload. Open Policy Agent (OPA) is another good candidate to implement your own least privilege policy for a workload. We will look at OPA more in Chapter 7, Authentication, Authorization, and Admission Control.
Now, let's look at the resource limit control mechanism in Kubernetes as you may not want your microservices to saturate all the resources, such as the Central Processing Unit (CPU) and memory, in the system.
Resource limit control
By default, a single container can use as much memory and CPU resources as a node has. A container with a crypto-mining binary running may easily consume the CPU resources on the node shared by other pods. It's always a good practice to set resource requests and limits for workload. The resource request impacts which node the pods will be assigned to by the scheduler, while the resource limit sets the condition under which the container will be terminated. It's always safe to assign more resource requests and limits to your workload to avoid eviction or termination. However, do keep in mind that if you set the resource request or limit too high, you've caused a resource waste on your cluster, and the resources allocated to your workload may not be fully utilized. We will cover this topic more in Chapter 10, Real-Time Monitoring and Resource Management of a Kubernetes Cluster.
Wrapping up least privilege for accessing system resources
When pods or containers run in privileged mode, unlike the non-privileged pods or containers, they have the same privileges as admin users on the node. If your workload runs in privileged mode, why is this the case? When a pod is able to assess host-level namespaces, the pod can access resources such as the network stack, process, and Interprocess Communication (IPC) at the host level. But do you really need to grant host-level namespace access or set privileged mode to your pods or containers? Also, if you know which Linux capabilities are required for your processes in the container, you'd better drop those unnecessary ones. And how much memory and CPU is sufficient for your workload to be fully functional? Please do think through these questions for the purpose of implementing the principle of least privilege for your Kubernetes workload. Properly set resource requests and limits, use security context for your workload, and enforce a PodSecurityPolicy for your cluster. All of this will help ensure the least privilege for your workload to access system resources.
Least privilege for accessing network resources
By default, any two pods inside the same Kubernetes cluster can communicate with other, and a pod may be able to communicate with the internet if there is no proxy rule or firewall rule configured outside the Kubernetes cluster. The openness of Kubernetes blurs the security boundary of microservices, and we mustn't overlook network resources such as API endpoints provided by other microservices that a container or pod can access.
Suppose one of your workloads (pod X) in namespace X only needs to access another microservice A in namespace NS1; meanwhile, there is microservice B in namespace NS2. Both microservice A and microservice B expose their Representational State Transfer (RESTful) endpoints. By default, your workload can access both microservice A and B assuming there is neither authentication nor authorization at the microservice level, and also no network policies enforced in namespaces NS1 and NS2. Take a look at the following diagram, which illustrates this:
In the preceding diagram, Pod X is able to access both microservices, though they reside in different namespaces. Note also that Pod X only requires access to Microservice A in namespace NS1. So, is there anything we can do to restrict Pod X's access to Microservice A only for the purpose of least privilege? Yes: a Kubernetes network policy can help. We will cover network policies in more detail Chapter 5, Configuring Kubernetes Security Boundaries. In general, a Kubernetes network policy defines rules of how a group of pods are allowed to communicate with each other and other network endpoints. You can define both ingress rules and egress rules for your workload.
Note
Ingress rules: Rules to define which sources are allowed to communicate with the pods under the protection of the network policy.
Egress rules: Rules to define which destinations are allowed to communicate with the pods under the protection of the network policy.
In the following example, to implement the principle of least privilege in Pod X, you will need to define a network policy in Namespace X with an egress rule specifying that only Microservice A is allowed:
In the preceding diagram, the network policy in Namespace X blocks any request from Pod X to Microservice B, and Pod X can still access Microservice A, as expected. Defining an egress rule in your network policy will help ensure least privilege for your workload to access network resources. Last but not least, we still need to bring your attention to the application resource level from a least-privilege standpoint.
Least privilege for accessing application resources
Though this topic falls into the category of application security, it is worth bringing up here. If there are applications that your workload accesses that support multiple users with different levels of privileges, it's better to examine whether the privileges granted to the user on your workload's behalf are necessary or not. For example, a user who is responsible for auditing does not need any write privileges. Application developers should keep this in mind when designing the application. This helps to ensure the least privilege for your workload to access application resources.