打开服务器,查看容器部署k8s组件节点是否正常。

[root@k8s-master01 ~]# kubectl get pod -n kube-system
NAME                                   READY   STATUS        RESTARTS   AGE
coredns-5c98db65d4-28krg               1/1     Terminating   0          47h
coredns-5c98db65d4-7f526               1/1     Running       0          5m18s
coredns-5c98db65d4-dmxnm               1/1     Terminating   0          47h
coredns-5c98db65d4-rx4zk               1/1     Running       0          5m8s
etcd-k8s-master01                      1/1     Running       2          47h
kube-apiserver-k8s-master01            1/1     Running       2          47h
kube-controller-manager-k8s-master01   1/1     Running       2          47h
kube-flannel-ds-amd64-25zmd            1/1     Running       0          22h
kube-flannel-ds-amd64-4b74f            1/1     Running       1          22h
kube-flannel-ds-amd64-s5p55            1/1     Running       0          22h
kube-proxy-2j97j                       1/1     Running       2          47h
kube-proxy-7cvzq                       1/1     Running       0          23h
kube-proxy-cd2fz                       1/1     Running       0          23h
kube-scheduler-k8s-master01            1/1     Running       2          47h

查看状态后,coredns容器组件,有两个为Terminating,但是还是有两个组件corendns为running。可以判断,搭建k8s组件没有问题。

接下来我们查看一下master端和node端节点。

通过命令查看,我们的master节点是准备状态,但是可以看到node节点名反馈的status信息为notready。这样我们可以简单判断去从主机启动到启动k8s组件环节定位问题。

[root@k8s-master01 ~]# kubectl get nodes
NAME           STATUS     ROLES    AGE   VERSION
k8s-master01   Ready      master   47h   v1.15.1
k8s-node01     NotReady   <none>   23h   v1.15.1
k8s-node02     NotReady   <none>   23h   v1.15.1

我们先选择一个node节点去查看一下kubelet启动日志 

[root@root@k8s-master01 ~]# journalctl -f -u kubelet

nodes节点反馈节点为如下:

[root@k8s-node01 ~]# journalctl -f -u kubelet
-- Logs begin at 二 2022-04-19 09:00:31 CST. --
4月 20 15:17:20 k8s-node01 kubelet[4049]: W0420 15:17:20.399757    4049 cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d
4月 20 15:17:23 k8s-node01 kubelet[4049]: I0420 15:17:23.787316    4049 reconciler.go:203] operationExecutor.VerifyControllerAttachedVolume started for volume "run" (UniqueName: "kubernetes.io/host-path/55c5127c-bc19-41f7-9a2c-9d6d76a9000d-run") pod "kube-flannel-ds-amd64-25zmd" (UID: "55c5127c-bc19-41f7-9a2c-9d6d76a9000d")
4月 20 15:17:23 k8s-node01 kubelet[4049]: I0420 15:17:23.787360    4049 reconciler.go:203] operationExecutor.VerifyControllerAttachedVolume started for volume "cni" (UniqueName: "kubernetes.io/host-path/55c5127c-bc19-41f7-9a2c-9d6d76a9000d-cni") pod "kube-flannel-ds-amd64-25zmd" (UID: "55c5127c-bc19-41f7-9a2c-9d6d76a9000d")
4月 20 15:17:23 k8s-node01 kubelet[4049]: I0420 15:17:23.787391    4049 reconciler.go:203] operationExecutor.VerifyControllerAttachedVolume started for volume "flannel-cfg" (UniqueName: "kubernetes.io/configmap/55c5127c-bc19-41f7-9a2c-9d6d76a9000d-flannel-cfg") pod "kube-flannel-ds-amd64-25zmd" (UID: "55c5127c-bc19-41f7-9a2c-9d6d76a9000d")
4月 20 15:17:23 k8s-node01 kubelet[4049]: I0420 15:17:23.787413    4049 reconciler.go:203] operationExecutor.VerifyControllerAttachedVolume started for volume "flannel-token-xwzkz" (UniqueName: "kubernetes.io/secret/55c5127c-bc19-41f7-9a2c-9d6d76a9000d-flannel-token-xwzkz") pod "kube-flannel-ds-amd64-25zmd" (UID: "55c5127c-bc19-41f7-9a2c-9d6d76a9000d")
4月 20 15:17:24 k8s-node01 kubelet[4049]: E0420 15:17:24.544451    4049 kubelet.go:2169] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
4月 20 15:17:35 k8s-node01 kubelet[4049]: I0420 15:17:35.744285    4049 reconciler.go:203] operationExecutor.VerifyControllerAttachedVolume started for volume "config-volume" (UniqueName: "kubernetes.io/configmap/52817037-d7b1-4d7f-8e21-7b6a8743fb5e-config-volume") pod "coredns-5c98db65d4-28krg" (UID: "52817037-d7b1-4d7f-8e21-7b6a8743fb5e")
4月 20 15:17:35 k8s-node01 kubelet[4049]: I0420 15:17:35.744346    4049 reconciler.go:203] operationExecutor.VerifyControllerAttachedVolume started for volume "coredns-token-j46q8" (UniqueName: "kubernetes.io/secret/52817037-d7b1-4d7f-8e21-7b6a8743fb5e-coredns-token-j46q8") pod "coredns-5c98db65d4-28krg" (UID: "52817037-d7b1-4d7f-8e21-7b6a8743fb5e")
4月 20 15:21:26 k8s-node01 systemd[1]: Stopping kubelet: The Kubernetes Node Agent...
4月 20 15:21:26 k8s-node01 systemd[1]: Stopped kubelet: The Kubernetes Node Agent.

从最后两条日志我们可以判断出,node节点已经停止被代理,那我们可以从kubelet着手去排查一下。因为主机中k8s的master启动顺序为:systemd > kubelet > 容器组件 > kubernetes,而且我们可以通过上述可以看到,我们到第三步容器组件已经全部是running状态,那我们可以从systemd到kubelet来判断一下,因为kubelet是一个应用进程,所以我们查看一下kubelet的状态。

[root@k8s-node01 ~]# systemctl status  kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/usr/lib/systemd/system/kubelet.service; disabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: inactive (dead)
     Docs: https://kubernetes.io/docs/

通过命令行我们看出,kubelet是没有加载起来的。

固我们用重启启动服务的命令,启动一下。

[root@k8s-node01 ~]# systemctl restart kubelet
[root@k8s-node01 ~]# systemctl status  kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/usr/lib/systemd/system/kubelet.service; disabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since 四 2022-04-21 14:15:52 CST; 2s ago
     Docs: https://kubernetes.io/docs/
 Main PID: 1620 (kubelet)
    Tasks: 15
   Memory: 149.4M
   CGroup: /system.slice/kubelet.service
           └─1620 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kube...

4月 21 14:15:54 k8s-node01 kubelet[1620]: E0421 14:15:54.108699    1620 remote_runtime.go:295] ContainerStatus "0...bd418
4月 21 14:15:54 k8s-node01 kubelet[1620]: I0421 14:15:54.702549    1620 reconciler.go:177] operationExecutor.Unmo...b5e")
4月 21 14:15:54 k8s-node01 kubelet[1620]: I0421 14:15:54.702603    1620 reconciler.go:177] operationExecutor.UnmountVo...
4月 21 14:15:54 k8s-node01 kubelet[1620]: W0421 14:15:54.702826    1620 empty_dir.go:421] Warning: Failed to clea...abled
4月 21 14:15:54 k8s-node01 kubelet[1620]: W0421 14:15:54.702893    1620 empty_dir.go:421] Warning: Failed to clea...abled
4月 21 14:15:54 k8s-node01 kubelet[1620]: I0421 14:15:54.702944    1620 operation_generator.go:860] UnmountVolume.Tear...
4月 21 14:15:54 k8s-node01 kubelet[1620]: I0421 14:15:54.702969    1620 operation_generator.go:860] UnmountVolume.Tear...
4月 21 14:15:54 k8s-node01 kubelet[1620]: I0421 14:15:54.703035    1620 reconciler.go:297] Volume detached for vo...th ""
4月 21 14:15:54 k8s-node01 kubelet[1620]: I0421 14:15:54.703043    1620 reconciler.go:297] Volume detached for vo...th ""
4月 21 14:15:55 k8s-node01 kubelet[1620]: W0421 14:15:55.003981    1620 kuberuntime_container.go:691] No ref for ...0e4"}
Hint: Some lines were ellipsized, use -l to show in full.

重启服务以后,我们再次查看kubelet状态,是运行状态。我们看一下node节点的日志。

[root@k8s-node01 ~]# journalctl -f -u kubelet
-- Logs begin at 二 2022-04-19 09:00:31 CST. --
4月 21 14:15:54 k8s-node01 kubelet[1620]: E0421 14:15:54.108699    1620 remote_runtime.go:295] ContainerStatus "0e2e13612e055551f0c33bea49979d5ee2f198e99ca8005618853250397bd418" from runtime service failed: rpc error: code = Unknown desc = Error: No such container: 0e2e13612e055551f0c33bea49979d5ee2f198e99ca8005618853250397bd418
4月 21 14:15:54 k8s-node01 kubelet[1620]: I0421 14:15:54.702549    1620 reconciler.go:177] operationExecutor.UnmountVolume started for volume "config-volume" (UniqueName: "kubernetes.io/configmap/52817037-d7b1-4d7f-8e21-7b6a8743fb5e-config-volume") pod "52817037-d7b1-4d7f-8e21-7b6a8743fb5e" (UID: "52817037-d7b1-4d7f-8e21-7b6a8743fb5e")
4月 21 14:15:54 k8s-node01 kubelet[1620]: I0421 14:15:54.702603    1620 reconciler.go:177] operationExecutor.UnmountVolume started for volume "coredns-token-j46q8" (UniqueName: "kubernetes.io/secret/52817037-d7b1-4d7f-8e21-7b6a8743fb5e-coredns-token-j46q8") pod "52817037-d7b1-4d7f-8e21-7b6a8743fb5e" (UID: "52817037-d7b1-4d7f-8e21-7b6a8743fb5e")
4月 21 14:15:54 k8s-node01 kubelet[1620]: W0421 14:15:54.702826    1620 empty_dir.go:421] Warning: Failed to clear quota on /var/lib/kubelet/pods/52817037-d7b1-4d7f-8e21-7b6a8743fb5e/volumes/kubernetes.io~configmap/config-volume: ClearQuota called, but quotas disabled
4月 21 14:15:54 k8s-node01 kubelet[1620]: W0421 14:15:54.702893    1620 empty_dir.go:421] Warning: Failed to clear quota on /var/lib/kubelet/pods/52817037-d7b1-4d7f-8e21-7b6a8743fb5e/volumes/kubernetes.io~secret/coredns-token-j46q8: ClearQuota called, but quotas disabled
4月 21 14:15:54 k8s-node01 kubelet[1620]: I0421 14:15:54.702944    1620 operation_generator.go:860] UnmountVolume.TearDown succeeded for volume "kubernetes.io/secret/52817037-d7b1-4d7f-8e21-7b6a8743fb5e-coredns-token-j46q8" (OuterVolumeSpecName: "coredns-token-j46q8") pod "52817037-d7b1-4d7f-8e21-7b6a8743fb5e" (UID: "52817037-d7b1-4d7f-8e21-7b6a8743fb5e"). InnerVolumeSpecName "coredns-token-j46q8". PluginName "kubernetes.io/secret", VolumeGidValue ""
4月 21 14:15:54 k8s-node01 kubelet[1620]: I0421 14:15:54.702969    1620 operation_generator.go:860] UnmountVolume.TearDown succeeded for volume "kubernetes.io/configmap/52817037-d7b1-4d7f-8e21-7b6a8743fb5e-config-volume" (OuterVolumeSpecName: "config-volume") pod "52817037-d7b1-4d7f-8e21-7b6a8743fb5e" (UID: "52817037-d7b1-4d7f-8e21-7b6a8743fb5e"). InnerVolumeSpecName "config-volume". PluginName "kubernetes.io/configmap", VolumeGidValue ""
4月 21 14:15:54 k8s-node01 kubelet[1620]: I0421 14:15:54.703035    1620 reconciler.go:297] Volume detached for volume "config-volume" (UniqueName: "kubernetes.io/configmap/52817037-d7b1-4d7f-8e21-7b6a8743fb5e-config-volume") on node "k8s-node01" DevicePath ""
4月 21 14:15:54 k8s-node01 kubelet[1620]: I0421 14:15:54.703043    1620 reconciler.go:297] Volume detached for volume "coredns-token-j46q8" (UniqueName: "kubernetes.io/secret/52817037-d7b1-4d7f-8e21-7b6a8743fb5e-coredns-token-j46q8") on node "k8s-node01" DevicePath ""
4月 21 14:15:55 k8s-node01 kubelet[1620]: W0421 14:15:55.003981    1620 kuberuntime_container.go:691] No ref for container {"docker" "92e81d86339b20f5558d2ebe341e46df56fdb411b31cef164bba2b6591a790e4"}

日志显示node1节点正常访问状态。

接下来我们再去回到master节点,使用命令查看一下KS8集群状态。

[root@k8s-master01 ~]# kubectl  get nodes
NAME           STATUS     ROLES    AGE   VERSION
k8s-master01   Ready      master   47h   v1.15.1
k8s-node01     Ready      <none>   23h   v1.15.1
k8s-node02     NotReady   <none>   23h   v1.15.1

可以看到,我们的node1节点通过重启kubelet,已经成为ready状态,那我们接下来只需要把node2节点进行与node1节点同样的操作即可。

[root@k8s-node02 ~]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/usr/lib/systemd/system/kubelet.service; disabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since 四 2022-04-21 14:19:08 CST; 2s ago
     Docs: https://kubernetes.io/docs/
 Main PID: 1637 (kubelet)
    Tasks: 16
   Memory: 147.1M
   CGroup: /system.slice/kubelet.service
           └─1637 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kube...

4月 21 14:19:09 k8s-node02 kubelet[1637]: I0421 14:19:09.686045    1637 reconciler.go:203] operationExecutor.VerifyCon...
4月 21 14:19:09 k8s-node02 kubelet[1637]: I0421 14:19:09.686058    1637 reconciler.go:203] operationExecutor.VerifyCon...
4月 21 14:19:09 k8s-node02 kubelet[1637]: I0421 14:19:09.686070    1637 reconciler.go:203] operationExecutor.Veri...5bb")
4月 21 14:19:09 k8s-node02 kubelet[1637]: I0421 14:19:09.686082    1637 reconciler.go:203] operationExecutor.VerifyCon...
4月 21 14:19:09 k8s-node02 kubelet[1637]: I0421 14:19:09.686093    1637 reconciler.go:203] operationExecutor.Veri...b10")
4月 21 14:19:09 k8s-node02 kubelet[1637]: I0421 14:19:09.686104    1637 reconciler.go:203] operationExecutor.Veri...5bb")
4月 21 14:19:09 k8s-node02 kubelet[1637]: I0421 14:19:09.686210    1637 reconciler.go:203] operationExecutor.Veri...5bb")
4月 21 14:19:09 k8s-node02 kubelet[1637]: I0421 14:19:09.686236    1637 reconciler.go:150] Reconciler: start to s...state
4月 21 14:19:10 k8s-node02 kubelet[1637]: W0421 14:19:10.218389    1637 kuberuntime_container.go:691] No ref for ...28b"}
4月 21 14:19:11 k8s-node02 kubelet[1637]: W0421 14:19:11.336287    1637 pod_container_deletor.go:75] Container "1...iners
Hint: Some lines were ellipsized, use -l to show in full.

最后在查看一下mster节点。

[root@k8s-master01 ~]# kubectl  get nodes
NAME           STATUS   ROLES    AGE   VERSION
k8s-master01   Ready    master   47h   v1.15.1
k8s-node01     Ready    <none>   23h   v1.15.1
k8s-node02     Ready    <none>   23h   v1.15.1

已经启动了,我们的kubetnetes可以正常进行工作了。

到这里,此次故障漏洞可能在于,我们个人通过容器安装kubernetes时,容易忽略kubelet启动服务。所以我们可以在部署的时候,直接将服务设置成开机自启。

[root@k8s-node01 ~]# systemctl enable  kubelet
Created symlink from /etc/systemd/system/multi-user.target.wants/kubelet.service to /usr/lib/systemd/system/kubelet.service.

最后可以关注一下K8S启动流程, 包括部署的时候多注意一些的话,我们是可以避免问题发生。

Logo

为开发者提供学习成长、分享交流、生态实践、资源工具等服务,帮助开发者快速成长。

更多推荐