Jobs and CronJobs

An application that lives for a short period of time that's destined for a task and then dies is considered a job.

Some examples:

Generate a report
Process a complex calculation or an image
Send emails

If we were to do this with docker, we could do a simple mathematical operation.

❯ docker run --name soma ubuntu expr 3 + 2
5
❯ docker ps -a | grep soma
d813cef7ec93   ubuntu                               "expr 3 + 2"             About a minute ago   Exited (0) About a minute ago                                                                         soma

The container is no longer running, it executed and exited with exit code 0, that is, it did what it had to do.

This is the lifecycle of many pods, but we can separate this type of pod as Jobs.

If we were to do this using a pod, what would happen?

❯ kubectl run math-pod --image ubuntu --dry-run=client -o yaml --command expr 3 + 2 > math-pod.yaml

❯ cat math-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: math-pod
  name: math-pod
spec:
  containers:
  - command:
    - "expr"
    - "3"
    - "+"
    - "2"
    image: ubuntu
    name: math-pod
    resources: {}
  dnsPolicy: ClusterFirst
  restartPolicy: Always
status: {}

# It executes but keeps restarting all the time, because Kubernetes keeps trying to bring it back
❯ k get pods math-pod
NAME       READY   STATUS      RESTARTS      AGE
math-pod   0/1     Completed   3 (28s ago)   47s

Kubernetes wants its applications to live forever. The default behavior of a pod is to try to restart the pod to keep it running if it's stopped. Where can we confirm this?

kubectl get pod math-pod -o yaml | grep restartPolicy
  restartPolicy: Always

If we had defined it as Never or OnFailure, this wouldn't happen.

We could have several containers processing in parallel within a pod, including using features like initContainers. But we have ReplicaSet which would do this too without needing to define multiple containers in a pod in case of wanting parallel processing.

While a ReplicaSet wants to ensure a number of pods running, a Job wants to ensure that multiple pods complete a task.

A Job is also a controller for pods just like ReplicaSet, the difference is that it wants them to finish. So let's define a job.

apiVersion: batch/v1 # Pay attention to the API
kind: Job
metadata:
  name: math-add
spec:
  template:
    spec:
      containers:
      - image: ubuntu
        name: math-pod
        resources: {}
        command:
        - "expr"
        - "3"
        - "+"
        - "2"
      dnsPolicy: ClusterFirst
      restartPolicy: Never

Let's test:

❯ kubectl apply -f math-job.yaml

❯ k get jobs
NAME       COMPLETIONS   DURATION   AGE
math-add   1/1           4s         46s

# We can observe that the pod didn't keep restarting
❯ k get pods
NAME             READY   STATUS      RESTARTS   AGE
math-add-skttf   0/1     Completed   0          49s

❯ k logs pods/math-add-skttf
5

# Deleting the job also deletes the pod created by it
❯ k delete job math-add
job.batch "math-add" deleted

❯ k get pods
No resources found in default namespace.

Of course, this is a simple simulation that's not used in the real world. Generally, a job will persist data in some volume, or send it somewhere, or send an email, etc.

Continuing with the example, we can increase the number of tasks we want it to execute. Instead of using replicas, we use completions.

apiVersion: batch/v1
kind: Job
metadata:
  name: math-add
spec:
  completions: 3
  template:
    spec:
      containers:
      - image: ubuntu
        name: math-pod
        resources: {}
        command:
        - "expr"
        - "3"
        - "+"
        - "2"
      dnsPolicy: ClusterFirst
      restartPolicy: Never

If we look at what it generated, we have:

❯ k apply -f math-job.yaml
job.batch/math-add created

❯ k get pods -o wide
NAME             READY   STATUS      RESTARTS   AGE   IP            NODE                   NOMINATED NODE   READINESS GATES
math-add-2gcbl   0/1     Completed   0          6s    10.244.2.12   kind-cluster-worker2   <none>           <none>
math-add-fp6tc   0/1     Completed   0          15s   10.244.2.10   kind-cluster-worker2   <none>           <none>
math-add-lfv5v   0/1     Completed   0          11s   10.244.2.11   kind-cluster-worker2   <none>           <none>

❯ k get job math-add
NAME       COMPLETIONS   DURATION   AGE
math-add   3/3           13s        82s

If we observe the AGE column, we can see that these jobs were executed one after another.

The job only finishes when it has 3 completions executed, otherwise it will keep creating new pods.

For them to be executed in parallel, we can define parallelism. By default, it's 1, that's why it runs one after another.

apiVersion: batch/v1
kind: Job
metadata:
  name: math-add
spec:
  completions: 9
  parallelism: 3 # Will run 3 at a time
  template:
    spec:
      containers:
      - image: ubuntu
        name: math-pod
        resources: {}
        command:
        - "expr"
        - "3"
        - "+"
        - "2"
      dnsPolicy: ClusterFirst
      restartPolicy: Never

If we want all of them to run together, we need to define completions and parallelism with the same values.

It's also possible to define the number of failures, otherwise the job will keep running until it achieves its number of completions.

Jobs vs CronJobs

The difference is that the CronJob is the controller of the Job. With CronJob, we can schedule Jobs periodically. If CronJob is a controller of Job, from our experience it needs to have a template for a Job inside it, but defining some extra things like the schedule which will be when the Job will execute.

apiVersion: batch/v1 # Pay attention to the API
kind: CronJob
metadata:
  name: cj-math-add
spec:
  schedule: "*/1 * * * *" # will run every 1 minute
  jobTemplate: # This is the job
    spec:
      completions: 3 # All together
      parallelism: 3
      template:
        spec:
          containers:
          - image: ubuntu
            name: math-pod
            resources: {}
            command:
            - "expr"
            - "3"
            - "+"
            - "2"
          dnsPolicy: ClusterFirst
          restartPolicy: Never

Let's create one to observe:

❯ k apply -f cronjob.yaml
cronjob.batch/cj-math-add created

❯ k get cronjobs
NAME          SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
cj-math-add   */1 * * * *   False     0        <none>          27s

# It's not time yet
❯ k get jobs
No resources found in default namespace.

# It started
❯ k get jobs
NAME                   COMPLETIONS   DURATION   AGE
cj-math-add-28579344   0/3           0s         0s

# It completed
❯ k get jobs
NAME                   COMPLETIONS   DURATION   AGE
cj-math-add-28579344   3/3           4s         6s

# Observe the AGE and see that all were created together because of parallelism
❯ k get pods
NAME                         READY   STATUS      RESTARTS   AGE
cj-math-add-28579344-cmtjz   0/1     Completed   0          17s
cj-math-add-28579344-m999t   0/1     Completed   0          17s
cj-math-add-28579344-s9kj6   0/1     Completed   0          17s

# giving it more time we'll see it ran again after one minute
❯ k get jobs
NAME                   COMPLETIONS   DURATION   AGE
cj-math-add-28579344   3/3           4s         119s
cj-math-add-28579345   3/3           5s         59s

❯ k get pods
NAME                         READY   STATUS      RESTARTS   AGE
cj-math-add-28579344-cmtjz   0/1     Completed   0          2m4s
cj-math-add-28579344-m999t   0/1     Completed   0          2m4s
cj-math-add-28579344-s9kj6   0/1     Completed   0          2m4s
cj-math-add-28579345-89dd6   0/1     Completed   0          64s
cj-math-add-28579345-jfpzm   0/1     Completed   0          64s
cj-math-add-28579345-lmh8j   0/1     Completed   0          64s
cj-math-add-28579346-htpgx   0/1     Completed   0          4s
cj-math-add-28579346-mndcv   0/1     Completed   0          4s
cj-math-add-28579346-qkq6v   0/1     Completed   0          4s

# Deleting the cronjob, it deletes all jobs which delete all pods
❯ k delete cronjobs.batch cj-math-add
cronjob.batch "cj-math-add" deleted

❯ k get pods
No resources found in default namespace.

Jobs vs CronJobs​

Jobs vs CronJobs