Skip to main content

Jobs and CronJobs

An application that lives for a short period of time that's destined for a task and then dies is considered a job.

Some examples:

  • Generate a report
  • Process a complex calculation or an image
  • Send emails

If we were to do this with docker, we could do a simple mathematical operation.

docker run --name soma ubuntu expr 3 + 2
5
docker ps -a | grep soma
d813cef7ec93 ubuntu "expr 3 + 2" About a minute ago Exited (0) About a minute ago soma

The container is no longer running, it executed and exited with exit code 0, that is, it did what it had to do.

This is the lifecycle of many pods, but we can separate this type of pod as Jobs.

If we were to do this using a pod, what would happen?

❯ kubectl run math-pod --image ubuntu --dry-run=client -o yaml --command expr 3 + 2 > math-pod.yaml

cat math-pod.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: math-pod
name: math-pod
spec:
containers:
- command:
- "expr"
- "3"
- "+"
- "2"
image: ubuntu
name: math-pod
resources: {}
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {}

# It executes but keeps restarting all the time, because Kubernetes keeps trying to bring it back
❯ k get pods math-pod
NAME READY STATUS RESTARTS AGE
math-pod 0/1 Completed 3 (28s ago) 47s

Kubernetes wants its applications to live forever. The default behavior of a pod is to try to restart the pod to keep it running if it's stopped. Where can we confirm this?

kubectl get pod math-pod -o yaml | grep restartPolicy
restartPolicy: Always

If we had defined it as Never or OnFailure, this wouldn't happen.

We could have several containers processing in parallel within a pod, including using features like initContainers. But we have ReplicaSet which would do this too without needing to define multiple containers in a pod in case of wanting parallel processing.

While a ReplicaSet wants to ensure a number of pods running, a Job wants to ensure that multiple pods complete a task.

A Job is also a controller for pods just like ReplicaSet, the difference is that it wants them to finish. So let's define a job.

apiVersion: batch/v1 # Pay attention to the API
kind: Job
metadata:
name: math-add
spec:
template:
spec:
containers:
- image: ubuntu
name: math-pod
resources: {}
command:
- "expr"
- "3"
- "+"
- "2"
dnsPolicy: ClusterFirst
restartPolicy: Never

Let's test:

❯ kubectl apply -f math-job.yaml

❯ k get jobs
NAME COMPLETIONS DURATION AGE
math-add 1/1 4s 46s

# We can observe that the pod didn't keep restarting
❯ k get pods
NAME READY STATUS RESTARTS AGE
math-add-skttf 0/1 Completed 0 49s

❯ k logs pods/math-add-skttf
5

# Deleting the job also deletes the pod created by it
❯ k delete job math-add
job.batch "math-add" deleted

❯ k get pods
No resources found in default namespace.

Of course, this is a simple simulation that's not used in the real world. Generally, a job will persist data in some volume, or send it somewhere, or send an email, etc.

Continuing with the example, we can increase the number of tasks we want it to execute. Instead of using replicas, we use completions.

apiVersion: batch/v1
kind: Job
metadata:
name: math-add
spec:
completions: 3
template:
spec:
containers:
- image: ubuntu
name: math-pod
resources: {}
command:
- "expr"
- "3"
- "+"
- "2"
dnsPolicy: ClusterFirst
restartPolicy: Never

If we look at what it generated, we have:

❯ k apply -f math-job.yaml
job.batch/math-add created

❯ k get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
math-add-2gcbl 0/1 Completed 0 6s 10.244.2.12 kind-cluster-worker2 <none> <none>
math-add-fp6tc 0/1 Completed 0 15s 10.244.2.10 kind-cluster-worker2 <none> <none>
math-add-lfv5v 0/1 Completed 0 11s 10.244.2.11 kind-cluster-worker2 <none> <none>

❯ k get job math-add
NAME COMPLETIONS DURATION AGE
math-add 3/3 13s 82s

If we observe the AGE column, we can see that these jobs were executed one after another.

The job only finishes when it has 3 completions executed, otherwise it will keep creating new pods.

For them to be executed in parallel, we can define parallelism. By default, it's 1, that's why it runs one after another.

apiVersion: batch/v1
kind: Job
metadata:
name: math-add
spec:
completions: 9
parallelism: 3 # Will run 3 at a time
template:
spec:
containers:
- image: ubuntu
name: math-pod
resources: {}
command:
- "expr"
- "3"
- "+"
- "2"
dnsPolicy: ClusterFirst
restartPolicy: Never

If we want all of them to run together, we need to define completions and parallelism with the same values.

It's also possible to define the number of failures, otherwise the job will keep running until it achieves its number of completions.

Jobs vs CronJobs

The difference is that the CronJob is the controller of the Job. With CronJob, we can schedule Jobs periodically. If CronJob is a controller of Job, from our experience it needs to have a template for a Job inside it, but defining some extra things like the schedule which will be when the Job will execute.

apiVersion: batch/v1 # Pay attention to the API
kind: CronJob
metadata:
name: cj-math-add
spec:
schedule: "*/1 * * * *" # will run every 1 minute
jobTemplate: # This is the job
spec:
completions: 3 # All together
parallelism: 3
template:
spec:
containers:
- image: ubuntu
name: math-pod
resources: {}
command:
- "expr"
- "3"
- "+"
- "2"
dnsPolicy: ClusterFirst
restartPolicy: Never

Let's create one to observe:

❯ k apply -f cronjob.yaml
cronjob.batch/cj-math-add created

❯ k get cronjobs
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
cj-math-add */1 * * * * False 0 <none> 27s

# It's not time yet
❯ k get jobs
No resources found in default namespace.

# It started
❯ k get jobs
NAME COMPLETIONS DURATION AGE
cj-math-add-28579344 0/3 0s 0s

# It completed
❯ k get jobs
NAME COMPLETIONS DURATION AGE
cj-math-add-28579344 3/3 4s 6s

# Observe the AGE and see that all were created together because of parallelism
❯ k get pods
NAME READY STATUS RESTARTS AGE
cj-math-add-28579344-cmtjz 0/1 Completed 0 17s
cj-math-add-28579344-m999t 0/1 Completed 0 17s
cj-math-add-28579344-s9kj6 0/1 Completed 0 17s

# giving it more time we'll see it ran again after one minute
❯ k get jobs
NAME COMPLETIONS DURATION AGE
cj-math-add-28579344 3/3 4s 119s
cj-math-add-28579345 3/3 5s 59s

❯ k get pods
NAME READY STATUS RESTARTS AGE
cj-math-add-28579344-cmtjz 0/1 Completed 0 2m4s
cj-math-add-28579344-m999t 0/1 Completed 0 2m4s
cj-math-add-28579344-s9kj6 0/1 Completed 0 2m4s
cj-math-add-28579345-89dd6 0/1 Completed 0 64s
cj-math-add-28579345-jfpzm 0/1 Completed 0 64s
cj-math-add-28579345-lmh8j 0/1 Completed 0 64s
cj-math-add-28579346-htpgx 0/1 Completed 0 4s
cj-math-add-28579346-mndcv 0/1 Completed 0 4s
cj-math-add-28579346-qkq6v 0/1 Completed 0 4s

# Deleting the cronjob, it deletes all jobs which delete all pods
❯ k delete cronjobs.batch cj-math-add
cronjob.batch "cj-math-add" deleted

❯ k get pods
No resources found in default namespace.