kubernetes(K8S)容器部署,重新启动后,node节点提示notready无法正常工作。
kubernetes容器部署,重新启动后,node节点无法正常工作
打开服务器,查看容器部署k8s组件节点是否正常。
[root@k8s-master01 ~]# kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-5c98db65d4-28krg 1/1 Terminating 0 47h
coredns-5c98db65d4-7f526 1/1 Running 0 5m18s
coredns-5c98db65d4-dmxnm 1/1 Terminating 0 47h
coredns-5c98db65d4-rx4zk 1/1 Running 0 5m8s
etcd-k8s-master01 1/1 Running 2 47h
kube-apiserver-k8s-master01 1/1 Running 2 47h
kube-controller-manager-k8s-master01 1/1 Running 2 47h
kube-flannel-ds-amd64-25zmd 1/1 Running 0 22h
kube-flannel-ds-amd64-4b74f 1/1 Running 1 22h
kube-flannel-ds-amd64-s5p55 1/1 Running 0 22h
kube-proxy-2j97j 1/1 Running 2 47h
kube-proxy-7cvzq 1/1 Running 0 23h
kube-proxy-cd2fz 1/1 Running 0 23h
kube-scheduler-k8s-master01 1/1 Running 2 47h
查看状态后,coredns容器组件,有两个为Terminating,但是还是有两个组件corendns为running。可以判断,搭建k8s组件没有问题。
接下来我们查看一下master端和node端节点。
通过命令查看,我们的master节点是准备状态,但是可以看到node节点名反馈的status信息为notready。这样我们可以简单判断去从主机启动到启动k8s组件环节定位问题。
[root@k8s-master01 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master01 Ready master 47h v1.15.1
k8s-node01 NotReady <none> 23h v1.15.1
k8s-node02 NotReady <none> 23h v1.15.1
我们先选择一个node节点去查看一下kubelet启动日志
[root@root@k8s-master01 ~]# journalctl -f -u kubelet
nodes节点反馈节点为如下:
[root@k8s-node01 ~]# journalctl -f -u kubelet
-- Logs begin at 二 2022-04-19 09:00:31 CST. --
4月 20 15:17:20 k8s-node01 kubelet[4049]: W0420 15:17:20.399757 4049 cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d
4月 20 15:17:23 k8s-node01 kubelet[4049]: I0420 15:17:23.787316 4049 reconciler.go:203] operationExecutor.VerifyControllerAttachedVolume started for volume "run" (UniqueName: "kubernetes.io/host-path/55c5127c-bc19-41f7-9a2c-9d6d76a9000d-run") pod "kube-flannel-ds-amd64-25zmd" (UID: "55c5127c-bc19-41f7-9a2c-9d6d76a9000d")
4月 20 15:17:23 k8s-node01 kubelet[4049]: I0420 15:17:23.787360 4049 reconciler.go:203] operationExecutor.VerifyControllerAttachedVolume started for volume "cni" (UniqueName: "kubernetes.io/host-path/55c5127c-bc19-41f7-9a2c-9d6d76a9000d-cni") pod "kube-flannel-ds-amd64-25zmd" (UID: "55c5127c-bc19-41f7-9a2c-9d6d76a9000d")
4月 20 15:17:23 k8s-node01 kubelet[4049]: I0420 15:17:23.787391 4049 reconciler.go:203] operationExecutor.VerifyControllerAttachedVolume started for volume "flannel-cfg" (UniqueName: "kubernetes.io/configmap/55c5127c-bc19-41f7-9a2c-9d6d76a9000d-flannel-cfg") pod "kube-flannel-ds-amd64-25zmd" (UID: "55c5127c-bc19-41f7-9a2c-9d6d76a9000d")
4月 20 15:17:23 k8s-node01 kubelet[4049]: I0420 15:17:23.787413 4049 reconciler.go:203] operationExecutor.VerifyControllerAttachedVolume started for volume "flannel-token-xwzkz" (UniqueName: "kubernetes.io/secret/55c5127c-bc19-41f7-9a2c-9d6d76a9000d-flannel-token-xwzkz") pod "kube-flannel-ds-amd64-25zmd" (UID: "55c5127c-bc19-41f7-9a2c-9d6d76a9000d")
4月 20 15:17:24 k8s-node01 kubelet[4049]: E0420 15:17:24.544451 4049 kubelet.go:2169] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
4月 20 15:17:35 k8s-node01 kubelet[4049]: I0420 15:17:35.744285 4049 reconciler.go:203] operationExecutor.VerifyControllerAttachedVolume started for volume "config-volume" (UniqueName: "kubernetes.io/configmap/52817037-d7b1-4d7f-8e21-7b6a8743fb5e-config-volume") pod "coredns-5c98db65d4-28krg" (UID: "52817037-d7b1-4d7f-8e21-7b6a8743fb5e")
4月 20 15:17:35 k8s-node01 kubelet[4049]: I0420 15:17:35.744346 4049 reconciler.go:203] operationExecutor.VerifyControllerAttachedVolume started for volume "coredns-token-j46q8" (UniqueName: "kubernetes.io/secret/52817037-d7b1-4d7f-8e21-7b6a8743fb5e-coredns-token-j46q8") pod "coredns-5c98db65d4-28krg" (UID: "52817037-d7b1-4d7f-8e21-7b6a8743fb5e")
4月 20 15:21:26 k8s-node01 systemd[1]: Stopping kubelet: The Kubernetes Node Agent...
4月 20 15:21:26 k8s-node01 systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
从最后两条日志我们可以判断出,node节点已经停止被代理,那我们可以从kubelet着手去排查一下。因为主机中k8s的master启动顺序为:systemd > kubelet > 容器组件 > kubernetes,而且我们可以通过上述可以看到,我们到第三步容器组件已经全部是running状态,那我们可以从systemd到kubelet来判断一下,因为kubelet是一个应用进程,所以我们查看一下kubelet的状态。
[root@k8s-node01 ~]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; disabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: inactive (dead)
Docs: https://kubernetes.io/docs/
通过命令行我们看出,kubelet是没有加载起来的。
固我们用重启启动服务的命令,启动一下。
[root@k8s-node01 ~]# systemctl restart kubelet
[root@k8s-node01 ~]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; disabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since 四 2022-04-21 14:15:52 CST; 2s ago
Docs: https://kubernetes.io/docs/
Main PID: 1620 (kubelet)
Tasks: 15
Memory: 149.4M
CGroup: /system.slice/kubelet.service
└─1620 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kube...
4月 21 14:15:54 k8s-node01 kubelet[1620]: E0421 14:15:54.108699 1620 remote_runtime.go:295] ContainerStatus "0...bd418
4月 21 14:15:54 k8s-node01 kubelet[1620]: I0421 14:15:54.702549 1620 reconciler.go:177] operationExecutor.Unmo...b5e")
4月 21 14:15:54 k8s-node01 kubelet[1620]: I0421 14:15:54.702603 1620 reconciler.go:177] operationExecutor.UnmountVo...
4月 21 14:15:54 k8s-node01 kubelet[1620]: W0421 14:15:54.702826 1620 empty_dir.go:421] Warning: Failed to clea...abled
4月 21 14:15:54 k8s-node01 kubelet[1620]: W0421 14:15:54.702893 1620 empty_dir.go:421] Warning: Failed to clea...abled
4月 21 14:15:54 k8s-node01 kubelet[1620]: I0421 14:15:54.702944 1620 operation_generator.go:860] UnmountVolume.Tear...
4月 21 14:15:54 k8s-node01 kubelet[1620]: I0421 14:15:54.702969 1620 operation_generator.go:860] UnmountVolume.Tear...
4月 21 14:15:54 k8s-node01 kubelet[1620]: I0421 14:15:54.703035 1620 reconciler.go:297] Volume detached for vo...th ""
4月 21 14:15:54 k8s-node01 kubelet[1620]: I0421 14:15:54.703043 1620 reconciler.go:297] Volume detached for vo...th ""
4月 21 14:15:55 k8s-node01 kubelet[1620]: W0421 14:15:55.003981 1620 kuberuntime_container.go:691] No ref for ...0e4"}
Hint: Some lines were ellipsized, use -l to show in full.
重启服务以后,我们再次查看kubelet状态,是运行状态。我们看一下node节点的日志。
[root@k8s-node01 ~]# journalctl -f -u kubelet
-- Logs begin at 二 2022-04-19 09:00:31 CST. --
4月 21 14:15:54 k8s-node01 kubelet[1620]: E0421 14:15:54.108699 1620 remote_runtime.go:295] ContainerStatus "0e2e13612e055551f0c33bea49979d5ee2f198e99ca8005618853250397bd418" from runtime service failed: rpc error: code = Unknown desc = Error: No such container: 0e2e13612e055551f0c33bea49979d5ee2f198e99ca8005618853250397bd418
4月 21 14:15:54 k8s-node01 kubelet[1620]: I0421 14:15:54.702549 1620 reconciler.go:177] operationExecutor.UnmountVolume started for volume "config-volume" (UniqueName: "kubernetes.io/configmap/52817037-d7b1-4d7f-8e21-7b6a8743fb5e-config-volume") pod "52817037-d7b1-4d7f-8e21-7b6a8743fb5e" (UID: "52817037-d7b1-4d7f-8e21-7b6a8743fb5e")
4月 21 14:15:54 k8s-node01 kubelet[1620]: I0421 14:15:54.702603 1620 reconciler.go:177] operationExecutor.UnmountVolume started for volume "coredns-token-j46q8" (UniqueName: "kubernetes.io/secret/52817037-d7b1-4d7f-8e21-7b6a8743fb5e-coredns-token-j46q8") pod "52817037-d7b1-4d7f-8e21-7b6a8743fb5e" (UID: "52817037-d7b1-4d7f-8e21-7b6a8743fb5e")
4月 21 14:15:54 k8s-node01 kubelet[1620]: W0421 14:15:54.702826 1620 empty_dir.go:421] Warning: Failed to clear quota on /var/lib/kubelet/pods/52817037-d7b1-4d7f-8e21-7b6a8743fb5e/volumes/kubernetes.io~configmap/config-volume: ClearQuota called, but quotas disabled
4月 21 14:15:54 k8s-node01 kubelet[1620]: W0421 14:15:54.702893 1620 empty_dir.go:421] Warning: Failed to clear quota on /var/lib/kubelet/pods/52817037-d7b1-4d7f-8e21-7b6a8743fb5e/volumes/kubernetes.io~secret/coredns-token-j46q8: ClearQuota called, but quotas disabled
4月 21 14:15:54 k8s-node01 kubelet[1620]: I0421 14:15:54.702944 1620 operation_generator.go:860] UnmountVolume.TearDown succeeded for volume "kubernetes.io/secret/52817037-d7b1-4d7f-8e21-7b6a8743fb5e-coredns-token-j46q8" (OuterVolumeSpecName: "coredns-token-j46q8") pod "52817037-d7b1-4d7f-8e21-7b6a8743fb5e" (UID: "52817037-d7b1-4d7f-8e21-7b6a8743fb5e"). InnerVolumeSpecName "coredns-token-j46q8". PluginName "kubernetes.io/secret", VolumeGidValue ""
4月 21 14:15:54 k8s-node01 kubelet[1620]: I0421 14:15:54.702969 1620 operation_generator.go:860] UnmountVolume.TearDown succeeded for volume "kubernetes.io/configmap/52817037-d7b1-4d7f-8e21-7b6a8743fb5e-config-volume" (OuterVolumeSpecName: "config-volume") pod "52817037-d7b1-4d7f-8e21-7b6a8743fb5e" (UID: "52817037-d7b1-4d7f-8e21-7b6a8743fb5e"). InnerVolumeSpecName "config-volume". PluginName "kubernetes.io/configmap", VolumeGidValue ""
4月 21 14:15:54 k8s-node01 kubelet[1620]: I0421 14:15:54.703035 1620 reconciler.go:297] Volume detached for volume "config-volume" (UniqueName: "kubernetes.io/configmap/52817037-d7b1-4d7f-8e21-7b6a8743fb5e-config-volume") on node "k8s-node01" DevicePath ""
4月 21 14:15:54 k8s-node01 kubelet[1620]: I0421 14:15:54.703043 1620 reconciler.go:297] Volume detached for volume "coredns-token-j46q8" (UniqueName: "kubernetes.io/secret/52817037-d7b1-4d7f-8e21-7b6a8743fb5e-coredns-token-j46q8") on node "k8s-node01" DevicePath ""
4月 21 14:15:55 k8s-node01 kubelet[1620]: W0421 14:15:55.003981 1620 kuberuntime_container.go:691] No ref for container {"docker" "92e81d86339b20f5558d2ebe341e46df56fdb411b31cef164bba2b6591a790e4"}
日志显示node1节点正常访问状态。
接下来我们再去回到master节点,使用命令查看一下KS8集群状态。
[root@k8s-master01 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master01 Ready master 47h v1.15.1
k8s-node01 Ready <none> 23h v1.15.1
k8s-node02 NotReady <none> 23h v1.15.1
可以看到,我们的node1节点通过重启kubelet,已经成为ready状态,那我们接下来只需要把node2节点进行与node1节点同样的操作即可。
[root@k8s-node02 ~]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; disabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since 四 2022-04-21 14:19:08 CST; 2s ago
Docs: https://kubernetes.io/docs/
Main PID: 1637 (kubelet)
Tasks: 16
Memory: 147.1M
CGroup: /system.slice/kubelet.service
└─1637 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kube...
4月 21 14:19:09 k8s-node02 kubelet[1637]: I0421 14:19:09.686045 1637 reconciler.go:203] operationExecutor.VerifyCon...
4月 21 14:19:09 k8s-node02 kubelet[1637]: I0421 14:19:09.686058 1637 reconciler.go:203] operationExecutor.VerifyCon...
4月 21 14:19:09 k8s-node02 kubelet[1637]: I0421 14:19:09.686070 1637 reconciler.go:203] operationExecutor.Veri...5bb")
4月 21 14:19:09 k8s-node02 kubelet[1637]: I0421 14:19:09.686082 1637 reconciler.go:203] operationExecutor.VerifyCon...
4月 21 14:19:09 k8s-node02 kubelet[1637]: I0421 14:19:09.686093 1637 reconciler.go:203] operationExecutor.Veri...b10")
4月 21 14:19:09 k8s-node02 kubelet[1637]: I0421 14:19:09.686104 1637 reconciler.go:203] operationExecutor.Veri...5bb")
4月 21 14:19:09 k8s-node02 kubelet[1637]: I0421 14:19:09.686210 1637 reconciler.go:203] operationExecutor.Veri...5bb")
4月 21 14:19:09 k8s-node02 kubelet[1637]: I0421 14:19:09.686236 1637 reconciler.go:150] Reconciler: start to s...state
4月 21 14:19:10 k8s-node02 kubelet[1637]: W0421 14:19:10.218389 1637 kuberuntime_container.go:691] No ref for ...28b"}
4月 21 14:19:11 k8s-node02 kubelet[1637]: W0421 14:19:11.336287 1637 pod_container_deletor.go:75] Container "1...iners
Hint: Some lines were ellipsized, use -l to show in full.
最后在查看一下mster节点。
[root@k8s-master01 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master01 Ready master 47h v1.15.1
k8s-node01 Ready <none> 23h v1.15.1
k8s-node02 Ready <none> 23h v1.15.1
已经启动了,我们的kubetnetes可以正常进行工作了。
到这里,此次故障漏洞可能在于,我们个人通过容器安装kubernetes时,容易忽略kubelet启动服务。所以我们可以在部署的时候,直接将服务设置成开机自启。
[root@k8s-node01 ~]# systemctl enable kubelet
Created symlink from /etc/systemd/system/multi-user.target.wants/kubelet.service to /usr/lib/systemd/system/kubelet.service.
最后可以关注一下K8S启动流程, 包括部署的时候多注意一些的话,我们是可以避免问题发生。
更多推荐
所有评论(0)