Skip to main content

Security Context

In the CKS we need to go deeper into this topic. Take an overview at cka security context.

Security context allows us to define privileges and access control at pod level or at container level.

We can specify:

  • userID
  • groupID
  • Privilege escalation
  • Linux capabilities
  • others
spec:
# Pod level (Applied to all containers)
securityContext:
runAsUser: 1000
runAsGroup: 3000
fsGroup: 2000

containers:
- name: busybox
command: ["sleep","3600"]
image: busybox
# Container level
securityContext:
runAsUser: 0 # Will override the user in this pod
#
#runAsUser: 0
#runAsGroup: 3000
#fsGroup: 2000
- name: busybox-2
command: ["sleep","3600"]
image: busybox2
# This will be inherited from pod definition
#securityContext:
#runAsUser: 1000
#runAsGroup: 3000
#fsGroup: 2000

In this definition we're saying that:

  • runAsUser -> uid=1000 (user ID)
  • runAsGroup -> gid=3000 (main group ID)
  • fsGroup -> groups=2000 (supplementary group ID)

Starting with Kubernetes 1.25, the operating system (os) field was introduced in the pod spec with possible values for name being windows or linux (default). This field indicates which operating system the pod will run on. In addition to being useful in the future for kube-scheduler, it will also have functionality in security context.

spec:
os:
name: windows
containers:
- name: windows-container
image: mcr.microsoft.com/windows/servercore:ltsc2022

There are parameters that can exist at pod level, container level, or both. Let's do a quick analysis just to know what's possible. Nobody needs to remember this for the CKS.

A quick general table where we'll only talk about the main ones.

ParameterTypeOSPod LevelContainer LevelDescription
allowPrivilegeEscalationbooleanLinuxNoYesControls whether the process can gain more privileges than its parent. It's automatically set to true if "privileged true" or has CAP_SYS_ADMIN capability.
appArmorProfileAppArmorProfileLinuxYesYesIf set, the pod's appArmorProfile will be changed
capabilitiesCapabilitiesLinuxNoYesTo add or drop capabilities in the container.
fsGroupIntegerLinuxYesNoA supplemental group that will be applied to all containers in the pod.
privilegedbooleanLinuxNoYesTo run the container as root granting privileges equivalent to the host root. Default is false.
procMountstringLinuxNoYesInvolves the type of proc mount used by the container.
readOnlyRootFilesystembooleanLinuxNoYesWhether the container's /root should be read-only. Default is false
runAsGroupintegerLinuxYesYesThe gid to run the process entrypoint.
runAsNonRootbooleanLinuxYesYesIndicates that the container must run with a non-root user. Kubelet validates if the image is setting the user.
runAsUserintegerLinuxYesYesUser UID. Default is the same used in the container image.
seLinuxOptionsSELinuxOptionsLinuxYesYesIf not specified, the container runtime will allocate a random SELinux for each container.
seccompProfileSeccompProfileLinuxYesYesSeccomp options for containers.
supplementalGroupsinteger arrayLinuxYesNoA list of GIDs applied to the first process of each container.
supplementalGroupsPolicystringLinuxYesNoOnly used if supplementalGroups is defined.
sysctlssysctls arrayLinuxYesNoContains a list of namespaced sysctls used by the pod.
windowsOptionsWindowsSecurityContextOptionswindowsYesYesWindows-specific settings applied to all containers.

Let's use this yaml as a base and we'll modify it several times.

root@cks-master:~# k run pod --image=busybox --command -oyaml --dry-run=client -- sh -c 'sleep 1d' > pod.yaml

root@cks-master:~# cat pod.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: pod
name: pod
spec:
containers:
- command:
- sh
- -c
- sleep 1d
image: busybox
name: pod
resources: {}
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {}
root@cks-master:~#

runAsUser and runAsGroup (pod and container levels)

  • The busybox image uses the root user in the image itself and we didn't pass anything to change it.
  • If we create a file it will be created according to the user we're using which in this case is root.
root@cks-master:~# k apply -f pod.yaml
pod/pod created

root@cks-master:~# k exec -it pod -- sh
/ # id
uid=0(root) gid=0(root) groups=10(wheel)
/ # touch test
/ # ls -lh test
-rw-r--r-- 1 root root 0 Aug 29 12:00 test
/ # exit

root@cks-master:~# k delete pod pod --force --grace-period 0
Warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "pod" force deleted

Let's define a user for the pod, which will be inherited by the container since we won't override it.

  • We changed the pod's user.
  • Since the image defines the workdir directly to /, as user 1000 we don't have permission to create anything.
  • If we change to a location like tmp that all users have permission to, we can create and the created file will belong to the specified user.
root@cks-master:~# vim pod.yaml

root@cks-master:~# cat pod.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: pod
name: pod
spec:
securityContext:
runAsUser: 1000
runAsGroup: 3000
containers:
- command:
- sh
- -c
- sleep 1d
image: busybox
name: pod
resources: {}
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {}

root@cks-master:~# k apply -f pod.yaml
pod/pod created

root@cks-master:~# k exec -it pod -- sh
~ $ id
uid=1000 gid=3000
~ $ touch test
touch: test: Permission denied
~ $ pwd
/
~ $ cd tmp/
/tmp $ touch test
/tmp $ ls -lh test
-rw-r--r-- 1 1000 3000 0 Aug 29 12:07 test
/tmp $ exit

root@cks-master:~# k delete pod pod --force --grace-period 0
Warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.

runAsNonRoot (pod and container Level)

Now let's force the container to run as non-root, but we won't pass any user. If the container image already defines a user we'll have no problems, but if it defines as root it won't be able to run.

  • In this scenario we keep the user and have no problems because we're changing the owner user of the main process.
  • Note that the owner of process 1 is user 1000 which we kept.
root@cks-master:~# vim pod.yaml

root@cks-master:~# cat pod.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: pod
name: pod
spec:
securityContext:
runAsUser: 1000
runAsGroup: 3000
containers:
- command:
- sh
- -c
- sleep 1d
image: busybox
name: pod
resources: {}
securityContext:
runAsNonRoot: true
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {}
root@cks-master:~# k apply -f pod.yaml
pod/pod created
root@cks-master:~# k get pods
NAME READY STATUS RESTARTS AGE
pod 1/1 Running 0 5s

root@cks-master:~# k exec -it pod -- sh
~ $
~ $ ps
PID USER TIME COMMAND
1 1000 0:00 sh -c sleep 1d
8 1000 0:00 sh
14 1000 0:00 ps
~ $ exit

root@cks-master:~# k delete pod pod --force --grace-period 0
Warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "pod" force deleted

However, removing the user we have the problem mentioned above.

  • In this case we're forcing the image to have a user defined for the process, which is not the case with busybox.
root@cks-master:~# vim pod.yaml

root@cks-master:~# cat pod.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: pod
name: pod
spec:
# securityContext:
# runAsUser: 1000
# runAsGroup: 3000
containers:
- command:
- sh
- -c
- sleep 1d
image: busybox
name: pod
resources: {}
securityContext:
runAsNonRoot: true
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {}

root@cks-master:~# k apply -f pod.yaml
pod/pod created

root@cks-master:~# k get pod
NAME READY STATUS RESTARTS AGE
pod 0/1 CreateContainerConfigError 0 3s

root@cks-master:~# k get pod pod -o jsonpath={.status.containerStatuses.*.state} | jq
{
"waiting": {
"message": "container has runAsNonRoot and image will run as root (pod: \"pod_default(7dd567c0-ead9-4460-a012-35acd9122bad)\", container: pod)",
"reason": "CreateContainerConfigError"
}
}

root@cks-master:~# k delete pod pod
pod "pod" deleted

The default nginx image uses root, but there's another image that doesn't use it, let's use it for testing

root@cks-master:~# k run nginx --image=nginxinc/nginx-unprivileged -o yaml --dry-run=client > podnonroot.yaml

root@cks-master:~# vim podnonroot.yaml

root@cks-master:~# cat podnonroot.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: nginx
name: nginx
spec:
containers:
- image: nginxinc/nginx-unprivileged
name: nginx
resources: {}
securityContext:
runAsNonRoot: true
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {}
root@cks-master:~# k apply -f podnonroot.yaml
pod/nginx created

root@cks-master:~# k exec -it nginx -- bash
nginx@nginx:/$ id
uid=101(nginx) gid=101(nginx) groups=101(nginx)
nginx@nginx:/$ exit
exit

root@cks-master:~# k delete pod nginx
pod "nginx" deleted

privileged (Container Level)

By default containers run as unprivileged, but it's possible to run as privileged.

A case where this could happen is if we wanted to run docker-in-docker in the container, that is, a container inside another. We could also have a container that needs access to all devices.

Running a container as privileged means that user 0 (root) in the container is directly mapped to user 0 (root) on the host. One of the abstractions of using containers is that inside the container we can have the same id as a user on the host or other containers but they are different, they can have the same id but different permissions.

With the sysctl command we can set kernel parameters at runtime, but for this we need root permission.

Without privileged, even being root in the container we can't change it.

root@cks-master:~# vim pod.yaml

root@cks-master:~# cat pod.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: pod
name: pod
spec:
containers:
- command:
- sh
- -c
- sleep 1d
image: busybox
name: pod
resources: {}
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {}

root@cks-master:~# k apply -f pod.yaml
pod/pod created

root@cks-master:~# k exec -it pod -- bash
root@cks-master:~# k exec -it pod -- sh
/ # id
uid=0(root) gid=0(root) groups=10(wheel)
/ # sysctl kernel.hostname=cks
sysctl: error setting key 'kernel.hostname': Read-only file system
/ # sysctl kernel.hostname=cks

Setting privileged to true. Remember that privileged is a security context at container level only.

root@cks-master:~# vim pod.yaml

root@cks-master:~# cat pod.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: pod
name: pod
spec:
containers:
- command:
- sh
- -c
- sleep 1d
image: busybox
name: pod
resources: {}
securityContext:
privileged: true
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {}

# For the record, the hostname is the pod name itself (in this case pod)
# The sysctl command changes in proc temporarily not in /etc/hostname.
root@cks-master:~# k exec pod -it -- sh
/ # cat /proc/sys/kernel/hostname
pod
/ # sysctl kernel.hostname=cks-test
kernel.hostname = cks-test
/ # cat /proc/sys/kernel/hostname
cks-test
/ # cat /etc/hostname
pod
/ # exit

# In the worker where the pod is running the change is at kernel level but in the pod's kernel group and not on the host. It's not the same filesystem.
root@cks-worker:~# cat /proc/sys/kernel/hostname
cks-worker

root@cks-master:~# k delete pod pod --force --grace-period 0
Warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "pod" force deleted

allowPrivilegeEscalation (Container Level)

Now let's talk about allowPrivilegeEscalation which is automatically true.

The allowPrivilegeEscalation resource in Kubernetes is a security configuration that controls whether a process inside a container can obtain additional privileges, such as through the sudo command or when using setuid binaries.

How it works:

  • allowPrivilegeEscalation: true (default): Allows processes inside the container to escalate their privileges. This may be necessary for some applications that need to temporarily elevate their privileges to perform certain operations.

  • allowPrivilegeEscalation: false: Blocks privilege elevation. Even if the container is run with root privileges, it won't be able to use mechanisms like sudo or setuid to gain additional privileges. This configuration is used as an extra layer of security to limit the capabilities of processes inside the container.

Relationship with privileged and runAsNonRoot:

  • privileged: If the container is in privileged: true mode, it ignores the allowPrivilegeEscalation configuration because it already has full privileges on the host.

  • runAsNonRoot: If runAsNonRoot: true is configured, allowPrivilegeEscalation should generally be false, as the goal is to ensure that the container doesn't have root access or the ability to escalate to it.

NoNewPrivs 0 means it's disabled, that is, it can escalate privileges.

root@cks-master:~# cat pod.yaml

root@cks-master:~# vim pod.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: pod
name: pod
spec:
containers:
- command:
- sh
- -c
- sleep 1d
image: busybox
name: pod
resources: {}
securityContext:
# This is already the default, it's just to confirm
allowPrivilegeEscalation: true
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {}
root@cks-master:~# k apply -f pod.yaml
pod/pod created
root@cks-master:~# k exec -it pod -- sh
/ #
/ # cat /proc/1/status | grep NoNewPrivs
NoNewPrivs:0
/ # exit
root@cks-master:~# k delete pod pod --force --grace-period 0
Warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "pod" force deleted

Let's change to false, NoNewPrivs should be 1 showing it's enabled.

root@cks-master:~# vim pod.yaml
root@cks-master:~# cat pod.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: pod
name: pod
spec:
containers:
- command:
- sh
- -c
- sleep 1d
image: busybox
name: pod
resources: {}
securityContext:
allowPrivilegeEscalation: false
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {}
root@cks-master:~# k apply -f pod.yaml
pod/pod created
root@cks-master:~# k exec -it pod -- sh
/ #
/ # cat /proc/1/status | grep NoNewPrivs
NoNewPrivs:1
/ # exit
root@cks-master:~# k delete pod pod --force --grace-period 0
Warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "pod" force deleted

AppArmor and SecComp

Take an overview of these two resources we have in Linux.

This content, although covered in the CKS, was made available in apparmor, seccomp.

Study of both tools is necessary.