Volumes

We know that the container's function is to process data and die, and what keeps it alive is the process with ID 1. If that process dies, the container stops. Data is only destroyed when the container is destroyed.

datalost

In the container world, to persist data, we use volumes.

retain

Containers within a pod run in their respective read-write layers as we discussed earlier. The pod controls the containers inside it, and when a pod is terminated, it destroys its respective containers, taking the data with it.

poddata

To persist data, it's also necessary to use volumes.

poddata

If we create a pod normally, data is lost when the pod dies

alt text

To create a volume using a Local driver, we need to declare a volume. It would be the same process when a container automatically creates the volume. In the scenario below, we create a volume inside the host, but the volume is not yet being used.

alt text

It's necessary to map that the container's /opt corresponds to the volume at /data.

alt text

If this pod is destroyed and created again and comes up on a different node, will the data be there? No.

Two points we must consider here.

This volume is passing a path on the node, so this directory must exist previously with appropriate permission.
This would work for a single-node cluster.
In a multi-node cluster, the pod can come up on any node that passes scheduler filtering. Data would not be available if it came up on a different node than the one it came up on before.

Having replicas for the same pod requires all pods to see the same directory for the data to make sense.

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: "ramdon-number-generator"
spec:
  containers:
  - name: alpine
    image: "alpine"

    command: ["/bin/sh", "-c"]
    args: ["shuf -i 0-100 -n 1 >> /opt/number.out;"]
    volumeMounts:
    - name: data-volume
      mountPath: /opt
  volumes:
    - name: data-volume
      hostPath:
        path: /data
        type: Directory
EOF

# Observe that the pod is not ready.
kubectl get pods
NAME                      READY   STATUS              RESTARTS   AGE
ramdon-number-generator   0/1     ContainerCreating   0          22s

kubectl describe pod ramdon-number-generator
## Removed for easier reading
Events:
  Type     Reason       Age                From               Message
  ----     ------       ----               ----               -------
  Normal   Scheduled    28s                default-scheduler  Successfully assigned default/ramdon-number-generator to kind-cluster-worker2
  # Failed to mount as expected
  Warning  FailedMount  13s (x6 over 29s)  kubelet            MountVolume.SetUp failed for volume "data-volume" : hostPath type check failed: /data is not a directory

If you changed it to /tmp, it would work.

But this is a specific solution for some and not for the vast majority. Ideally, don't store application data inside nodes, but have an external solution for this purpose.

Kubernetes supports different types of storage solutions.

NFS
GlusterFS
Flocker
Ceph
Scaleio
vSphere
Others

And also many cloud solutions like:

AWS
Azure
GCP
others

For example, on AWS, we could use EBS.

  volumes:
    - name: data-volume
      awsElasticBlockStore:
        volumeID: <volume-id>
        fsType: ext4

Let's create an nginx container and a volume shared between it and another. The second container should put the time in nginx's index.html.

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: pod-com-dois-containers
spec:
  containers:
    - name: nginx-container
      image: nginx
      ports:
        - containerPort: 80
      volumeMounts:
        - name: html-volume
          mountPath: /usr/share/nginx/html
    - name: hora-container
      image: alpine
      command: ["/bin/sh", "-c"]
      args:
        - while true; do echo "$(date '+%Y-%m-%d %H:%M:%S')" > /mnt/html/index.html; sleep 1; done
      volumeMounts:
        - name: html-volume
          mountPath: /mnt/html
  volumes:
    - name: html-volume
      emptyDir: {}
EOF