Secuencia de Verificación de Fallos de Nodos
En caso de que un nodo falle tenemos algunas situaciones para tener en cuenta.
-
Verifique el estado de los nodos con el comando
kubectl get nodespara ver el estado. -
Busque los eventos de los nodos que están not-ready con el comando describe para este nodo. Verifique las conditions.
kubectl describe nodes kind-cluster-worker
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
# Si MemoryPressure está como true estamos con falta de memoria para ejecutar los pods. Probablemente los pods deben estar crasheando.
MemoryPressure False Mon, 26 Feb 2024 08:57:01 -0300 Thu, 08 Feb 2024 20:02:46 -0300 KubeletHasSufficientMemory kubelet has sufficient memory available
# Si DiskPressure está como true entonces estamos con falta de capacidad de disco
DiskPressure False Mon, 26 Feb 2024 08:57:01 -0300 Thu, 08 Feb 2024 20:02:46 -0300 KubeletHasNoDiskPressure kubelet has no disk pressure
# PIDPressure será seteado como true si tiene muchos pods en este nodo
PIDPressure False Mon, 26 Feb 2024 08:57:01 -0300 Thu, 08 Feb 2024 20:02:46 -0300 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Mon, 26 Feb 2024 08:57:01 -0300 Thu, 08 Feb 2024 20:02:49 -0300 KubeletReady kubelet is posting ready statusSi alguna de esas pressures está seteada como true ya sabemos que es alguna falta de recurso. Si está como
Unknownprobablemente algún accidente ocurrió y perdió el estado. -
Compruebe los procesos y consumos en el nodo con el comando
topydf -htop - 12:11:32 up 22:41, 0 user, load average: 3.25, 2.79, 2.56
Tasks: 17 total, 1 running, 16 sleeping, 0 stopped, 0 zombie
%Cpu(s): 8.0 us, 0.5 sy, 0.0 ni, 91.2 id, 0.1 wa, 0.0 hi, 0.2 si, 0.0 st
MiB Mem : 64001.3 total, 41471.9 free, 11956.6 used, 13210.8 buff/cache
MiB Swap: 1952.0 total, 1952.0 free, 0.0 used. 52044.7 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
116 root 20 0 2470452 66056 36480 S 0.7 0.1 13:23.44 containerd
223 root 20 0 2999176 86524 53376 S 0.3 0.1 8:36.64 kubelet
1 root 20 0 20392 11648 8704 S 0.0 0.0 0:01.13 systemd
97 root 20 0 24792 11008 10240 S 0.0 0.0 0:00.08 systemd-journal
271 root 20 0 722648 13824 9856 S 0.0 0.0 0:07.79 containerd-shim
287 root 20 0 722648 13852 9856 S 0.0 0.0 0:07.95 containerd-shim
317 65535 20 0 996 512 512 S 0.0 0.0 0:00.00 pause
324 65535 20 0 996 512 512 S 0.0 0.0 0:00.01 pause
358 root 20 0 1284848 49360 36608 S 0.0 0.1 0:07.92 kube-proxy
446 root 20 0 743928 27448 19072 S 0.0 0.0 0:15.96 kindnetd
14316 root 20 0 722392 13184 9600 S 0.0 0.0 0:00.01 containerd-shim
14336 65535 20 0 996 512 512 S 0.0 0.0 0:00.00 pause
14373 root 20 0 2484 1280 1280 S 0.0 0.0 0:00.01 sleep
14400 root 20 0 2576 1408 1408 S 0.0 0.0 0:00.00 sh
14406 root 20 0 2576 128 128 S 0.0 0.0 0:00.00 sh
14407 root 20 0 4192 3328 2816 S 0.0 0.0 0:00.00 bash
14412 root 20 0 8568 4736 2688 R 0.0 0.0 0:00.00 top
root@kind-cluster-worker:/# df -h
Filesystem Size Used Avail Use% Mounted on
overlay 1.8T 571G 1.2T 33% /
tmpfs 64M 0 64M 0% /dev
shm 64M 0 64M 0% /dev/shm
/dev/mapper/vgubuntu-root 1.8T 571G 1.2T 33% /var
tmpfs 32G 8.6M 32G 1% /run
tmpfs 32G 0 32G 0% /tmp
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 63G 12K 63G 1% /var/lib/kubelet/pods/5a2bf15d-36fa-4c73-94a3-b491f4774e72/volumes/kubernetes.io~projected/kube-api-access-tpjjt
tmpfs 50M 12K 50M 1% /var/lib/kubelet/pods/92c3fe67-ccb9-437c-8d18-c16008dfa93b/volumes/kubernetes.io~projected/kube-api-access-cxt56
shm 64M 0 64M 0% /run/containerd/io.containerd.grpc.v1.cri/sandboxes/b6087483482422390d1fad0ec6726dfd98aba0d990b3f7f5a6d8224c15c4a4a3/shm
shm 64M 0 64M 0% /run/containerd/io.containerd.grpc.v1.cri/sandboxes/c13daa616a7ee7a7144b2acf39476a6e36fd454c1ebf345c26d3834703d11756/shm
overlay 1.8T 571G 1.2T 33% /run/containerd/io.containerd.runtime.v2.task/k8s.io/b6087483482422390d1fad0ec6726dfd98aba0d990b3f7f5a6d8224c15c4a4a3/rootfs
overlay 1.8T 571G 1.2T 33% /run/containerd/io.containerd.runtime.v2.task/k8s.io/c13daa616a7ee7a7144b2acf39476a6e36fd454c1ebf345c26d3834703d11756/rootfs
overlay 1.8T 571G 1.2T 33% /run/containerd/io.containerd.runtime.v2.task/k8s.io/0e6c2021f2b349bb0a16e5e5ecedb44a364566413ddfac25d09dd0538bf1de3b/rootfs
overlay 1.8T 571G 1.2T 33% /run/containerd/io.containerd.runtime.v2.task/k8s.io/970340bd3152b21a503b9e8fbc0b6af4948bed0bc9581f03f7140cbad18b8015/rootfs
tmpfs 63G 12K 63G 1% /var/lib/kubelet/pods/1af287b4-b519-4956-995a-5cf7403e0699/volumes/kubernetes.io~projected/kube-api-access-h9vz9
overlay 1.8T 571G 1.2T 33% /run/containerd/io.containerd.runtime.v2.task/k8s.io/36a2786c5693be823e1cd178341a794744583fe1d67548132f4a364933d54967/rootfs
overlay 1.8T 571G 1.2T 33% /run/containerd/io.containerd.runtime.v2.task/k8s.io/27c42f370c029cf965538366fd6f310cb9408ef6b305442ce21f32ec8947e2a6/rootfs -
Compruebe el estado del kubelet y los logs con
systemd status kubelet.serviceyjournalctl -xeu kubeletroot@kind-cluster-worker:/# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf, 11-kind.conf
Active: active (running) since Sun 2024-02-25 13:30:16 UTC; 22h ago
Docs: http://kubernetes.io/docs/
Process: 214 ExecStartPre=/bin/sh -euc if [ -f /sys/fs/cgroup/cgroup.controllers ]; then /kind/bin/create-kubelet-cgroup-v2.sh; fi (code=exited, status=0/SUCCESS)
Process: 222 ExecStartPre=/bin/sh -euc if [ ! -f /sys/fs/cgroup/cgroup.controllers ] && [ ! -d /sys/fs/cgroup/systemd/kubelet ]; then mkdir -p /sys/fs/cgroup/systemd/kubelet; fi (code=exited, status=0/SUCCESS)
Main PID: 223 (kubelet)
Tasks: 24 (limit: 11496)
Memory: 35.9M
CPU: 8min 38.699s
CGroup: /kubelet.slice/kubelet.service
└─223 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime-endpoint=unix:///run/containerd/containerd.sock --node-ip=172.18.0.4 --node-labels= --pod-infra-container-image=registry.k8s.io/pause:3.9 --provider-id=kind://docker/kind-cluster/kind-cluster-worker --runtime-cgroups=/system.slice/containerd.service
Feb 25 13:30:19 kind-cluster-worker kubelet[223]: I0225 13:30:19.211689 223 topology_manager.go:215] "Topology Admit Handler" podUID="5a2bf15d-36fa-4c73-94a3-b491f4774e72" podNamespace="kube-system" podName="kube-proxy-9zhh2"
Feb 25 13:30:19 kind-cluster-worker kubelet[223]: I0225 13:30:19.307285 223 desired_state_of_world_populator.go:159] "Finished populating initial desired state of world"
Feb 25 13:30:19 kind-cluster-worker kubelet[223]: I0225 13:30:19.340272 223 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"xtables-lock\" (UniqueName: \"kubernetes.io/host-path/5a2bf15d-36fa-4c73-94a3-b491f4774e72-xtables-lock\") pod \"kube-proxy-9zhh2\" (UID: \"5a2bf15d-36fa-4c73-94a3-b491f4774e72\") " pod="kube-system/kube-proxy-9zhh2"
Feb 25 13:30:19 kind-cluster-worker kubelet[223]: I0225 13:30:19.340288 223 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"lib-modules\" (UniqueName: \"kubernetes.io/host-path/5a2bf15d-36fa-4c73-94a3-b491f4774e72-lib-modules\") pod \"kube-proxy-9zhh2\" (UID: \"5a2bf15d-36fa-4c73-94a3-b491f4774e72\") " pod="kube-system/kube-proxy-9zhh2"
Feb 25 13:30:19 kind-cluster-worker kubelet[223]: I0225 13:30:19.340304 223 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"cni-cfg\" (UniqueName: \"kubernetes.io/host-path/92c3fe67-ccb9-437c-8d18-c16008dfa93b-cni-cfg\") pod \"kindnet-wnzds\" (UID: \"92c3fe67-ccb9-437c-8d18-c16008dfa93b\") " pod="kube-system/kindnet-wnzds"
Feb 25 13:30:19 kind-cluster-worker kubelet[223]: I0225 13:30:19.340316 223 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"xtables-lock\" (UniqueName: \"kubernetes.io/host-path/92c3fe67-ccb9-437c-8d18-c16008dfa93b-xtables-lock\") pod \"kindnet-wnzds\" (UID: \"92c3fe67-ccb9-437c-8d18-c16008dfa93b\") " pod="kube-system/kindnet-wnzds"
Feb 25 13:30:19 kind-cluster-worker kubelet[223]: I0225 13:30:19.340476 223 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"lib-modules\" (UniqueName: \"kubernetes.io/host-path/92c3fe67-ccb9-437c-8d18-c16008dfa93b-lib-modules\") pod \"kindnet-wnzds\" (UID: \"92c3fe67-ccb9-437c-8d18-c16008dfa93b\") " pod="kube-system/kindnet-wnzds"
Feb 26 12:11:12 kind-cluster-worker kubelet[223]: I0226 12:11:12.480201 223 topology_manager.go:215] "Topology Admit Handler" podUID="1af287b4-b519-4956-995a-5cf7403e0699" podNamespace="kube-system" podName="node-shell-2a728d18-d3d7-4c59-ad22-3a763f34b1c9"
Feb 26 12:11:12 kind-cluster-worker kubelet[223]: I0226 12:11:12.567123 223 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kube-api-access-h9vz9\" (UniqueName: \"kubernetes.io/projected/1af287b4-b519-4956-995a-5cf7403e0699-kube-api-access-h9vz9\") pod \"node-shell-2a728d18-d3d7-4c59-ad22-3a763f34b1c9\" (UID: \"1af287b4-b519-4956-995a-5cf7403e0699\") " pod="kube-system/node-shell-2a728d18-d3d7-4c59-ad22-3a763f34b1c9"
Feb 26 12:11:16 kind-cluster-worker kubelet[223]: I0226 12:11:16.957965 223 pod_startup_latency_tracker.go:102] "Observed pod startup duration" pod="kube-system/node-shell-2a728d18-d3d7-4c59-ad22-3a763f34b1c9" podStartSLOduration=1.652541497 podStartE2EDuration="4.957928607s" podCreationTimestamp="2024-02-26 12:11:12 +0000 UTC" firstStartedPulling="2024-02-26 12:11:12.853842258 +0000 UTC m=+81656.677152456" lastFinishedPulling="2024-02-26 12:11:16.159229367 +0000 UTC m=+81659.982539566" observedRunningTime="2024-02-26 12:11:16.957817629 +0000 UTC m=+81660.781127839" watchObservedRunningTime="2024-02-26 12:11:16.957928607 +0000 UTC m=+81660.781238814"
# Y en caso de que no sea posible analizar con el comando anterior, vamos a ver más detallados
root@kind-cluster-worker:/# journalctl -u kubelet
# (Salida similar de journalctl omitida por brevedad) -
Compruebe también los certificados y vea si no expiraron
root@kind-cluster-worker:/# openssl x509 -in /var/lib/kubelet/pki/kubelet.crt
-----BEGIN CERTIFICATE-----
MIIDTTCCAjWgAwIBAgIIdyIAO9Z5gVAwDQYJKoZIhvcNAQELBQAwLDEqMCgGA1UE
Awwha2luZC1jbHVzdGVyLXdvcmtlci1jYUAxNzA3NDMzMzY1MB4XDTI0MDIwODIy
MDI0NVoXDTI1MDIwNzIyMDI0NVowKTEnMCUGA1UEAwwea2luZC1jbHVzdGVyLXdv
cmtlckAxNzA3NDMzMzY1MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA
nXOMHMXoQiRePWMKnI6NN0VI7lhy6Te2Ia2y+QZ+qeDfMM9mi62kwbHcnCnFsptJ
8CBqv1mYpzNJaCDDiOrtB9Fv6gs6k0xARF+Tdw+CC2Mo7UJEVh4S5A1BnYTJUctm
tWA9jzUqbh3cxaubmN2AmzlmTk2+A6FZX+fR/bdNs9Gh+zrrkhF2irfs8Sxbp68f
KMB6HsgZOSdt014Dz9J5xB37Hh0R3KS0FYLcJ4TVaPGJrCypL26GezfZWjCRFm7q
wB/t7vbSNV/gFNt533Vdr6AxF8IZEVzdB2fxJ6/ofNDbsioFQ1iDhv4wQECu6jCH
6NkbzCZrPDF4KJrLXGjkNwIDAQABo3YwdDAOBgNVHQ8BAf8EBAMCBaAwEwYDVR0l
BAwwCgYIKwYBBQUHAwEwDAYDVR0TAQH/BAIwADAfBgNVHSMEGDAWgBT53Jnk+X3i
R7lLud6Q3HnbydB0azAeBgNVHREEFzAVghNraW5kLWNsdXN0ZXItd29ya2VyMA0G
CSqGSIb3DQEBCwUAA4IBAQAnNioBu6agqKH/kDgjGfut865x8ufWw2wlmyunx5CS
njAdP/csErsSrVXlzlYhdNaXHvCYZcwXCjUpL8wNYHJqT5aRhuMr4w6ZYACWY50o
jyepzZFA8BNxA7FH5SnQbr+JZP1y+bXlF3JbfYPNAEHZBRSuayw3WdU9iSuGghnG
pQA0OjOjZ7MwYXF3NKPuS/rPi6NERjykT8VYW6G2kIJDgPf4EaJ5lEKM3ifxjW+n
vu7XpnjG+Ff48Gq47BBwxhE9p/YTFLzyGZnbArx+u6V2yui3Q3agi7f0oJT1fqkp
RfbxkFBrCCuiVbswcaf4eBFwyMNqyg9mhn8r4Wo4N2z8
-----END CERTIFICATE-----
root@kind-cluster-worker:/# openssl x509 -in /var/lib/kubelet/pki/kubelet.crt --text
Certificate:
Data:
Version: 3 (0x2)
Serial Number: 8584424096722944336 (0x7722003bd6798150)
Signature Algorithm: sha256WithRSAEncryption
Issuer: CN = kind-cluster-worker-ca@1707433365
Validity
Not Before: Feb 8 22:02:45 2024 GMT
Not After : Feb 7 22:02:45 2025 GMT #OK
Subject: CN = kind-cluster-worker@1707433365
Subject Public Key Info:
Public Key Algorithm: rsaEncryption
Public-Key: (2048 bit)
Modulus:
00:9d:73:8c:1c:c5:e8:42:24:5e:3d:63:0a:9c:8e:
8d:37:45:48:ee:58:72:e9:37:b6:21:ad:b2:f9:06:
7e:a9:e0:df:30:cf:66:8b:ad:a4:c1:b1:dc:9c:29:
c5:b2:9b:49:f0:20:6a:bf:59:98:a7:33:49:68:20:
c3:88:ea:ed:07:d1:6f:ea:0b:3a:93:4c:40:44:5f:
93:77:0f:82:0b:63:28:ed:42:44:56:1e:12:e4:0d:
41:9d:84:c9:51:cb:66:b5:60:3d:8f:35:2a:6e:1d:
dc:c5:ab:9b:98:dd:80:9b:39:66:4e:4d:be:03:a1:
59:5f:e7:d1:fd:b7:4d:b3:d1:a1:fb:3a:eb:92:11:
76:8a:b7:ec:f1:2c:5b:a7:af:1f:28:c0:7a:1e:c8:
19:39:27:6d:d3:5e:03:cf:d2:79:c4:1d:fb:1e:1d:
11:dc:a4:b4:15:82:dc:27:84:d5:68:f1:89:ac:2c:
a9:2f:6e:86:7b:37:d9:5a:30:91:16:6e:ea:c0:1f:
ed:ee:f6:d2:35:5f:e0:14:db:79:df:75:5d:af:a0:
31:17:c2:19:11:5c:dd:07:67:f1:27:af:e8:7c:d0:
db:b2:2a:05:43:58:83:86:fe:30:40:40:ae:ea:30:
87:e8:d9:1b:cc:26:6b:3c:31:78:28:9a:cb:5c:68:
e4:37
Exponent: 65537 (0x10001)
X509v3 extensions:
X509v3 Key Usage: critical
Digital Signature, Key Encipherment
X509v3 Extended Key Usage:
TLS Web Server Authentication
X509v3 Basic Constraints: critical
CA:FALSE
X509v3 Authority Key Identifier:
F9:DC:99:E4:F9:7D:E2:47:B9:4B:B9:DE:90:DC:79:DB:C9:D0:74:6B
X509v3 Subject Alternative Name:
DNS:kind-cluster-worker
Signature Algorithm: sha256WithRSAEncryption
Signature Value:
27:36:2a:01:bb:a6:a0:a8:a1:ff:90:38:23:19:fb:ad:f3:ae:
71:f2:e7:d6:c3:6c:25:9b:2b:a7:c7:90:92:9e:30:1d:3f:f7:
2c:12:bb:12:ad:55:e5:ce:56:21:74:d6:97:1e:f0:98:65:cc:
17:0a:35:29:2f:cc:0d:60:72:6a:4f:96:91:86:e3:2b:e3:0e:
99:60:00:96:63:9d:28:8f:27:a9:cd:91:40:f0:13:71:03:b1:
47:e5:29:d0:6e:bf:89:64:fd:72:f9:b5:e5:17:72:5b:7d:83:
cd:00:41:d9:05:14:ae:6b:2c:37:59:d5:3d:89:2b:86:82:19:
c6:a5:00:34:3a:33:a3:67:b3:30:61:71:77:34:a3:ee:4b:fa:
cf:8b:a3:44:46:3c:a4:4f:c5:58:5b:a1:b6:90:82:43:80:f7:
f8:11:a2:79:94:42:8c:de:27:f1:8d:6f:a7:be:ee:d7:a6:78:
c6:f8:57:f8:f0:6a:b8:ec:10:70:c6:11:3d:a7:f6:13:14:bc:
f2:19:99:db:02:bc:7e:bb:a5:76:ca:e8:b7:43:76:a0:8b:b7:
f4:a0:94:f5:7e:a9:29:45:f6:f1:90:50:6b:08:2b:a2:55:bb:
30:71:a7:f8:78:11:70:c8:c3:6a:ca:0f:66:86:7f:2b:e1:6a:
38:37:6c:fc
-----BEGIN CERTIFICATE-----
MIIDTTCCAjWgAwIBAgIIdyIAO9Z5gVAwDQYJKoZIhvcNAQELBQAwLDEqMCgGA1UE
Awwha2luZC1jbHVzdGVyLXdvcmtlci1jYUAxNzA3NDMzMzY1MB4XDTI0MDIwODIy
MDI0NVoXDTI1MDIwNzIyMDI0NVowKTEnMCUGA1UEAwwea2luZC1jbHVzdGVyLXdv
cmtlckAxNzA3NDMzMzY1MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA
nXOMHMXoQiRePWMKnI6NN0VI7lhy6Te2Ia2y+QZ+qeDfMM9mi62kwbHcnCnFsptJ
8CBqv1mYpzNJaCDDiOrtB9Fv6gs6k0xARF+Tdw+CC2Mo7UJEVh4S5A1BnYTJUctm
tWA9jzUqbh3cxaubmN2AmzlmTk2+A6FZX+fR/bdNs9Gh+zrrkhF2irfs8Sxbp68f
KMB6HsgZOSdt014Dz9J5xB37Hh0R3KS0FYLcJ4TVaPGJrCypL26GezfZWjCRFm7q
wB/t7vbSNV/gFNt533Vdr6AxF8IZEVzdB2fxJ6/ofNDbsioFQ1iDhv4wQECu6jCH
6NkbzCZrPDF4KJrLXGjkNwIDAQABo3YwdDAOBgNVHQ8BAf8EBAMCBaAwEwYDVR0l
BAwwCgYIKwYBBQUHAwEwDAYDVR0TAQH/BAIwADAfBgNVHSMEGDAWgBT53Jnk+X3i
R7lLud6Q3HnbydB0azAeBgNVHREEFzAVghNraW5kLWNsdXN0ZXItd29ya2VyMA0G
CSqGSIb3DQEBCwUAA4IBAQAnNioBu6agqKH/kDgjGfut865x8ufWw2wlmyunx5CS
njAdP/csErsSrVXlzlYhdNaXHvCYZcwXCjUpL8wNYHJqT5aRhuMr4w6ZYACWY50o
jyepzZFA8BNxA7FH5SnQbr+JZP1y+bXlF3JbfYPNAEHZBRSuayw3WdU9iSuGghnG
pQA0OjOjZ7MwYXF3NKPuS/rPi6NERjykT8VYW6G2kIJDgPf4EaJ5lEKM3ifxjW+n
vu7XpnjG+Ff48Gq47BBwxhE9p/YTFLzyGZnbArx+u6V2yui3Q3agi7f0oJT1fqkp
RfbxkFBrCCuiVbswcaf4eBFwyMNqyg9mhn8r4Wo4N2z8
-----END CERTIFICATE-----Compruebe también los endpoints que el kubelet está apuntando para el kube-apiserver.