Skip to main content

Node Selector and Node Affinity

Node Selector

The nodeSelector uses Labels to specify which Nodes we want to target. We can also group Nodes using Labels and define Pods to be deployed only on Nodes that have those corresponding Labels.

Alt text

To add a Label to a Node.

kubectl label nodes k3d-k3d-cluster-agent-1 size=large
node/k3d-k3d-cluster-agent-1 labeled

kubectl describe node k3d-k3d-cluster-agent-1
Name: k3d-k3d-cluster-agent-1
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=k3s
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=k3d-k3d-cluster-agent-1
kubernetes.io/os=linux
node.kubernetes.io/instance-type=k3s
size=large

kubectl get node k3d-k3d-cluster-agent-1 --show-labels
NAME STATUS ROLES AGE VERSION LABELS
k3d-k3d-cluster-agent-1 Ready <none> 70d v1.27.4+k3s1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=k3s,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k3d-k3d-cluster-agent-1,kubernetes.io/os=linux,node.kubernetes.io/instance-type=k3s,size=large

kubectl label nodes k3d-k3d-cluster-agent-1 size-
node/k3d-k3d-cluster-agent-1 unlabeled

However, nodeSelector isn't always sufficient to solve our problems when requirements are more complex.

Node Affinity

In this context, we face a limitation of nodeSelector, which cannot handle complexities such as the need to select Nodes of a certain size, like large or medium, but avoid small ones. With nodeSelector, it's not possible to create 'one or another' expressions or even negate a specific condition.

With great power comes great complexity.

This nodeSelector and this affinity do the same thing!

Alt text

Every affinity expression comes under the affinity: key

There are affinities for Nodes and Pods, so we have the possible initials:

These values are not defined but can be set.

apiVersion: v1
...
spec:
containers:
....
affinity:
nodeAffinity: null
podAffinity: null
podAntiAffinity: null

The nodeAffinity will define an affinity for the Pod to be scheduled to a Node.

The podAffinity and podAntiAffinity define affinity between Pods. For example, we can configure a Pod to be placed where another specific Pod exists (podAffinity) or avoid being placed where Pods of a specific type exist (podAntiAffinity). A practical example would be ensuring that Pods A and B are always on the same Node, not just on the same group of Nodes with the same Label, to minimize network traffic and improve performance.

Within these affinity types, we can have these expressions:

  • requiredDuringSchedulingIgnoredDuringExecution: Defines MANDATORY rules for the Scheduler. If a Node doesn't meet these rules, the Pod won't be scheduled on it, but if it's already scheduled, it doesn't need to be removed from the Node if a Label is removed for example.

  • preferredDuringSchedulingIgnoredDuringExecution: Same as the previous rule but after trying as much as possible, if it can't and there's a way to be scheduled elsewhere, it can go there, as it's not mandatory.

  • requiredDuringSchedulingRequiredDuringExecution: The difference in not ignoring is that if conditions change, for example if a Label is removed, it will also remove the Pod. IT'S NOT POSSIBLE TO HAVE THIS EXPRESSION TOGETHER WITH OTHERS

    • STILL DOESN'T WORK WITH nodeAffinity

  • preferredDuringSchedulingRequiredDuringExecution: This expression is also possible. IT'S NOT POSSIBLE TO HAVE THIS EXPRESSION TOGETHER WITH OTHERS

    • STILL DOESN'T WORK WITH nodeAffinity

This would be possible.

apiVersion: v1
...
spec:
containers:
....
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
block
preferredDuringSchedulingIgnoredDuringExecution:
block
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
block
preferredDuringSchedulingIgnoredDuringExecution:
block
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
block
preferredDuringSchedulingIgnoredDuringExecution:
block

This would be possible, but doesn't work yet.

apiVersion: v1
...
spec:
containers:
....
affinity:
nodeAffinity:
requiredDuringSchedulingRequiredDuringExecution: # In particular, this expression can't be used yet, it's an issue under development in Kubernetes
block
podAffinity:
requiredDuringSchedulingRequiredDuringExecution:
block
podAntiAffinity:
requiredDuringSchedulingRequiredDuringExecution:
block

This would still be possible

apiVersion: v1
...
spec:
containers:
....
affinity:
nodeAffinity:
requiredDuringSchedulingRequiredDuringExecution: # In particular, this expression can't be used yet, it's an issue under development in Kubernetes
block
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
block
preferredDuringSchedulingIgnoredDuringExecution:
block
podAntiAffinity:
requiredDuringSchedulingRequiredDuringExecution:
block

But this is not possible

apiVersion: v1
...
spec:
containers:
....
affinity:
nodeAffinity:
requiredDuringSchedulingRequiredDuringExecution: # In particular, this expression can't be used yet, it's an issue under development in Kubernetes
block
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
block
requiredDuringSchedulingRequiredDuringExecution: # RequiredDuringExecution doesn't combine with anyone
block
podAntiAffinity:
requiredDuringSchedulingRequiredDuringExecution:
block

Now let's focus on the Block:

The block can be nodeSelectorTerms if using nodeAffinity or podSelectorTerms if using podAffinity or podAntiAffinity

nodeSelectorTerms and podSelectorTerms enable the possible OR. Either one expression, or another expression, or another, etc. Could matchExpressions be placed directly in the block without being inside the terms blocks? Yes it could. If you keep it, it doesn't hurt and it's BETTER.

So let's go to the block.

apiVersion: v1
kind: Pod
...
spec:
containers:
....
affinity:
nodeAffinity:
requiredDuringSchedulingRequiredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: key1
operator: In
values:
- value1
- value2
# OR Rule 2
- matchFields:
- key: metadata.name
operator: In
values:
- node-1

Now to better understand what I said, the two affinities below are the same thing.

  affinity:
nodeAffinity:
requiredDuringSchedulingRequiredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: key1
operator: In
values:
- value1
- value2

# BUT IN THIS AFFINITY YOU LOST THE OR OPTION SO ALWAYS USE THE ONE ABOVE
affinity:
nodeAffinity:
requiredDuringSchedulingRequiredDuringExecution:
matchExpressions:
- key: key1
operator: In
values:
- value1
- value2

Now let's understand the possible operators we can use.

  • In: Will compare if the key values are IDENTICAL
  • NotIn: Will compare if the key values are DIFFERENT
  • Exists: Only wants to know if the key exists regardless of value
  • DoesNotExist: Only wants to know if the key DOESN'T EXIST regardless of value
  • Gt: Greater than, like the In operator but will compare numbers even if they're strings
  • Lt: Less than

In the case of Exists and DoesNotExist, an example

  affinity:
nodeAffinity:
requiredDuringSchedulingRequiredDuringExecution:
matchExpressions:
- key: key1
operator: Exists

If both are applied to a Pod nodeSelector takes precedence over affinity.

apiVersion: v1
kind: Pod
metadata:
name: example-pod
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: key1
operator: In
values:
- value1
- value2
nodeSelector:
key2: value3
containers:
- name: my-container
image: my-image

Moral of the story... If you master affinity, you don't need nodeSelector, but it's good to know.