排查


1. syatemctl start docker 启动 ----------------> 失败

2. systemctl status docker 查看日志

用docker原始启动命令启动

/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock

失败 此时原因不明确

空间不足

分析可能是docker 容器长期运行某个日志文件占用空间过大

3. 查找占用空间的文件/文件夹

du -sh /* | sort -nr 查看哪个目录占用空间大

局部图:显示/var文件夹占用57G cd到目录逐级排查

一直排查到目录文件:0a21b2602257c2844385da2e55123b2198acae92793d6ba362dea8620f1f2f25-json.log

发现此文件高达57G容量 

more  0a21b2602257c2844385da2e55123b2198acae92793d6ba362dea8620f1f2f25-json.log 发现是smartping的日志文件

rm -f   0a21b2602257c2844385da2e55123b2198acae92793d6ba362dea8620f1f2f25-json.log 

5. 重新启动

systemctl start docker 失败

systemctl status docker 用原始启动,命令报错:

unable to configure the Docker daemon with file

显示:/etc/docker/daemon.json文件报错   查看发现文件夹为空,将此文件备份 上传本机文件替换启动

成功! 

扩展:

docker dubug命令: 

  1. sudo dockerd --debug

sudo: unable to resolve host wuhanDX2_smartping: Resource temporarily unavailable
INFO[2022-06-16T03:43:15.356296672Z] Starting up
failed to start daemon: pid file found, ensure docker is not running or delete /var/run/docker.pid

      2.定期删除备份命令:

find 文件全路径 -type f -mtime +5 |xargs rm -f 

参考:

https://blog.csdn.net/a854517900/article/details/80824966
  • 使用df -h查看磁盘空间占用情况
  • Filesystem      Size  Used Avail Use% Mounted on
    udev            3.9G     0  3.9G   0% /dev
    tmpfs           799M  3.1M  796M   1% /run
    /dev/vda1        99G   99G   0G  100% /
    tmpfs           3.9G     0  3.9G   0% /dev/shm
    tmpfs           5.0M  4.0K  5.0M   1% /run/lock
    tmpfs           3.9G     0  3.9G   0% /sys/fs/cgroup
    tmpfs           799M     0  799M   0% /run/user/0
    

  • 使用du -s /* | sort -nr命令查看那个目录占用空间大
    9999500 /root
    2711464 /usr
    794104  /var
    633800  /lib
    263164  /home
    96780   /boot
    75988   /tmp
    12728   /bin
    7308    /sbin
    4868    /etc
    3132    /run
    16      /lost+found
    12      /media
    4       /srv
    4       /opt
    4       /mnt
    4       /lib64
    0       /vmlinuz.old
    0       /vmlinuz
    0       /sys
    0       /proc
    0       /initrd.img.old
    0       /initrd.img
    0       /dev
    

    然后那个目录占用多 再通过du -s /root/* | sort -nr 一层层排查,找到占用文件多的地方。我今天发现的问题是tomcat日志文件二年的都没删过。有40g,后来找到log目录删除掉了。

    使用du -h --max-depth=1查看当前目录下文件夹大小情况

  • 1.2M    ./work
    203M    ./webapps
    16K     ./temp
    7.4M    ./lib
    804K    ./bin
    236K    ./conf
    11M     ./logs
    224M    .
    

    如果通过以上方法没有找到问题所在,那么可以使用 lsof | grep deleted 命令,看看是否删除掉的文件仍然被进程占用而没有进行实际删除。

  • systemd-j   198             root  txt       REG              253,1     326224    1185339 /lib/systemd/systemd-journald (deleted)
    systemd-l   399             root  txt       REG              253,1     618520    1185354 /lib/systemd/systemd-logind (deleted)
    agetty      781             root  txt       REG              253,1      44104     664044 /sbin/agetty (deleted)
    agetty      786             root  txt       REG              253,1      44104     664044 /sbin/agetty (deleted)
    mysqld    13409            mysql    4u      REG              253,1          0     918684 /tmp/ib0wMnKJ (deleted)
    mysqld    13409            mysql    5u      REG              253,1        100     918685 /tmp/ibQoVqHN (deleted)
    mysqld    13409            mysql    6u      REG              253,1          0     918686 /tmp/ib0IhuER (deleted)
    mysqld    13409            mysql    7u      REG              253,1          0     918687 /tmp/ibYj6KAZ (deleted)
    mysqld    13409            mysql   16u      REG              253,1          0     918688 /tmp/ibJb6HC3 (deleted)
    mysqld    13409 12709      mysql    4u      REG              253,1          0     918684 /tmp/ib0wMnKJ (deleted)
    mysqld    13409 12709      mysql    5u      REG              253,1        100     918685 /tmp/ibQoVqHN (deleted)
    mysqld    13409 12709      mysql    6u      REG              253,1          0     918686 /tmp/ib0IhuER (deleted)
    mysqld    13409 12709      mysql    7u      REG              253,1          0     918687 /tmp/ibYj6KAZ (deleted)
    mysqld    13409 12709      mysql   16u      REG              253,1          0     918688 /tmp/ibJb6HC3 (deleted)
    mysqld    13409 13410      mysql    4u      REG              253,1          0     918684 /tmp/ib0wMnKJ (deleted)
    mysqld    13409 13410      mysql    5u      REG              253,1        100     918685 /tmp/ibQoVqHN (deleted)
    mysqld    13409 13410      mysql    6u      REG              253,1          0     918686 /tmp/ib0IhuER (deleted)
    mysqld    13409 13410      mysql    7u      REG              253,1          0     918687 /tmp/ibYj6KAZ (deleted)
    mysqld    13409 13410      mysql   16u      REG              253,1          0     918688 /tmp/ibJb6HC3 (deleted)
    mysqld    13409 13411      mysql    4u      REG              253,1          0     918684 /tmp/ib0wMnKJ (deleted)
    mysqld    13409 13411      mysql    5u      REG              253,1        100     918685 /tmp/ibQoVqHN (deleted)
    mysqld    13409 13411      mysql    6u      REG              253,1          0     918686 /tmp/ib0IhuER (deleted)
    mysqld    13409 13411      mysql    7u      REG              253,1          0     918687 /tmp/ibYj6KAZ (deleted)
    mysqld    13409 13411      mysql   16u      REG              253,1          0     918688 /tmp/ibJb6HC3 (deleted)
    mysqld    13409 13412      mysql    4u      REG              253,1          0     918684 /tmp/ib0wMnKJ (deleted)
    mysqld    13409 13412      mysql    5u      REG              253,1        100     918685 /tmp/ibQoVqHN (deleted)
    mysqld    13409 13412      mysql    6u      REG              253,1          0     918686 /tmp/ib0IhuER (deleted)
    

    找到占用文件很大的进程,停止进程。之后重新启动,就OK了。

  • 扩展磁盘分区挂载:

https://blog.csdn.net/woailyoo0000/article/details/86485666

    关于挂载:

https://blog.csdn.net/weixin_40035337/article/details/107861833

Logo

为开发者提供学习成长、分享交流、生态实践、资源工具等服务,帮助开发者快速成长。

更多推荐