YOLOv5报错大全

YOLOv5报错大全YOLOv5报错大全YOLOv5报错大全1、安装GPU训练环境1.1 显卡驱动报错：NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver1、安装GPU训练环境1.1 显卡驱动报错：NVIDIA-SMI has failed because it couldn’t communic

我的眼中只有学习

10310人浏览 · 2021-07-19 10:33:31

我的眼中只有学习 · 2021-07-19 10:33:31 发布

YOLOv5报错大全

YOLOv5报错大全

在这里插入图片描述

1、安装GPU训练环境

1.1 显卡驱动报错：NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver

# 查看预安装的nvidia版本，预安装可以在，>软件和更新>附加驱动 里安装
cd /usr/src
ls
sudo dkms install -m nvidia -v xxx(ls nvidia-后面那一堆)
sudo apt install nvidia-cuda-toolkit
reboot

nvcc --version
nvidia-smi

有如下结果即可表示nvidia的驱动没有任何问题了：

Mon Jul 19 09:01:53 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.119.04   Driver Version: 450.119.04   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 166...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   39C    P8     5W /  N/A |    185MiB /  5944MiB |      3%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1028      G   /usr/lib/xorg/Xorg                 82MiB |
|    0   N/A  N/A      1343      G   /usr/bin/gnome-shell              100MiB |
+-----------------------------------------------------------------------------+

# 最后再在python里面检查torch是否切换为GPU版本（是否使用cuda）
import torch as t
t.cuda.is_available()
>>>True

注：若本来安装好了，nvidia-smi还是显示：Command 'nvidia-sim' not found

这里再sudo apt-get install 就会显示:
Module nvidia/srv-450.142.00 already installed on kernel 5.11.0-25-generic/x86_64

检查驱动：nvidia-settings正常打开

重新配置nvidia：sudo rmmod nvidia

rmmod: ERROR: Module nvidia is in use by: nvidia_uvm nvidia_modeset

再启动：sudo nvidia-smi

Mon Aug  9 10:28:21 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.142.00   Driver Version: 450.142.00   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 1650    Off  | 00000000:01:00.0  On |                  N/A |
| 40%   38C    P8     8W /  90W |    301MiB /  3902MiB |     36%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A       886      G   /usr/lib/xorg/Xorg                 85MiB |
|    0   N/A  N/A      1257      G   /usr/bin/gnome-shell              128MiB |
|    0   N/A  N/A      1673      G   ...AAAAAAAAA= --shared-files       85MiB |
+-----------------------------------------------------------------------------+

2、torch训练过程中

2.1 Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.

翻译：
错误原因：无法获取卷积算法。这可能是因为cuDNN初始化失败，所以请尝试查看上面是否打印了警告日志消息。
GPU内存占用太多，导致没有合理分配内存。需要使用tf.config()函数。

解决办法：
1 关闭其他正在running的的程序
2 查看任务管理器，查看GPU内存的使用量。
3 添加tf.config()
在这里插入图片描述

2.2 运行detect.py直接GPU算力拉满后直接挂掉内核

这个比上面还过分，至少我在踩这个坑很短时间内是没有反映过来的，有点惶恐甚至百度也没用。后来想想，kernel一声不响就挂掉了应该也是算力不太ok的，所以解决方案比较简单咯：

调整算力，换个3080ti
牺牲时间换算力（细水长流咯）

一般情况下选第二种：

# 将batch-size调整小，树的张成调小，为了达到相同的精度就需要更多次的回滚，所以就需要更多的时间
python3 train.py --img 640 --data ../datasets/coco/my.yaml --cfg ./models/yolov5s.yaml --weights ./weights/yolov5s.pt --batch-size 8 --epoch 60

2.3 Exception: train: …/datasets/coco/images/train/xxx does not exist

奇了个怪也，我刚刚用labelmg标记的，这么大一串data你咋跟我说不见了？

后来到处找了一下，才发现YOLOv5识别的数据集应该是这个样子才对：

而我的labelmg标记的结果是长这个样子：
在这里插入图片描述
后来才知道，YOLO本体是torch咯，不识别xml文件，所以要转换成.txt文件
代码如下：

import xml.etree.ElementTree as ET
import pickle
import os
from os import listdir, getcwd
from os.path import join
import random
from shutil import copyfile

# 根据自己的数据标签修改
classes=["0","1","2"]


def clear_hidden_files(path):
    dir_list = os.listdir(path)
    for i in dir_list:
        abspath = os.path.join(os.path.abspath(path), i)
        if os.path.isfile(abspath):
            if i.startswith("._"):
                os.remove(abspath)
        else:
            clear_hidden_files(abspath)

def convert(size, box):
    dw = 1./size[0]
    dh = 1./size[1]
    x = (box[0] + box[1])/2.0
    y = (box[2] + box[3])/2.0
    w = box[1] - box[0]
    h = box[3] - box[2]
    x = x*dw
    w = w*dw
    y = y*dh
    h = h*dh
    return (x,y,w,h)

def convert_annotation(image_id):
    in_file = open('../datasets/coco/labels/train/%s.xml' %image_id)
    out_file = open('../datasets/coco/labels/tran_train/%s.txt' %image_id, 'w')
    tree=ET.parse(in_file)
    root = tree.getroot()
    size = root.find('size')
    w = int(size.find('width').text)
    h = int(size.find('height').text)

    for obj in root.iter('object'):
        difficult = obj.find('difficult').text
        cls = obj.find('name').text
        if cls not in classes or int(difficult) == 1:
            continue
        cls_id = classes.index(cls)
        xmlbox = obj.find('bndbox')
        b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text), float(xmlbox.find('ymax').text))
        bb = convert((w,h), b)
        out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')
    in_file.close()
    out_file.close()

# 这里是根据我的文件格式读取文件名，调用convert_annotation方法
for file in os.listdir("../datasets/coco/labels/train/"):
    convert_annotation(file[:-4])

2.4 ValueError: not enough values to unpack (expected 3, got 0)

本来是想要调整标签的，用叠代切片的方式替换了lable：

import os, glob

if __name__ == '__main__':
    for file in os.listdir("../datasets/coco/labels/train/"):
        txt_list = glob.glob("../datasets/coco/labels/train/"+file)
        for txt_item in txt_list:
            with open(txt_item) as f:
                lines = f.readlines()
            with open(txt_item, 'w') as f:
                for line in lines:
                    line_split = line.strip().split()
                    if line_split[0] == '0':
                        line_split[0] = 'leaving'
                    elif line_split[0] == '1':
                        line_split[0] = 'working'
                    elif line_split[0] == '2':
                        line_split[0] = 'sleeping'
                    f.write(
                        line_split[0] + ' ' +
                        line_split[1] + ' ' +
                        line_split[2] + ' ' +
                        line_split[3] + ' ' +
                        line_split[4]+'\n')
            pass

却发现事情越弄越糟糕：

ValueError: zero-size array to reduction operation maximum which has no identity
ValueError: could not convert string to float
ValueError: not enough values to unpack (expected 3, got 0)

因果关系大概是：

更替的标签文件（也就是".txt"文件）中label为字符串，无法转换为float进行矩阵计算，造成标签矩阵zero-size为空矩阵，无法进行maximum operation（也就是一系列变换操作），所以输入的nc=3，然而got=0；

解决方案：

直接连环爆炸，后来才知道，lable的标签确实是不能更改的，一定要是数字才能str -> float;

如果需要更改输出时显示标签的话（就是detect的时候输出的标签）,你只需要更改my.yaml文件里的标签"names",按照输入时的顺序（我是推荐用[‘0’,‘1’,‘2‘]这在方式标记标签的，自解码也有顺序，然后根据标签时候的状态更改这里的训练"names"

3、中文标签显示（YOLOv5 v5.0版本）

基本操作同：YoloV5实现中文标签目标检测

但是5.0版本有一些细小的改动（参数重新命名），莫名其妙报了很多错误,像之前的box命名成img现在又改成im这些，头疼了一下午才慢慢摸索出来。

第六步：utils/plot.py文件修改

错误代码：name 'img' is not defined

def plot_one_box(x, im, color=(128, 128, 128), label=None, line_thickness=3):
    # Plots one bounding box on image 'im' using OpenCV
    assert im.data.contiguous, 'Image not contiguous. Apply np.ascontiguousarray(im) to plot_on_box() input image.'
    tl = line_thickness or round(0.002 * (im.shape[0] + im.shape[1]) / 2) + 1  # line/font thickness
    c1, c2 = (int(x[0]), int(x[1])), (int(x[2]), int(x[3]))
    cv2.rectangle(im, c1, c2, color, thickness=tl, lineType=cv2.LINE_AA)
    if label:
        tf = max(tl - 1, 1)  # font thickness
        t_size = cv2.getTextSize(label, 0, fontScale=tl / 3, thickness=tf)[0]
        font_size = t_size[1]
        font = ImageFont.truetype('wryh.ttf', font_size)
        t_size = font.getsize(label)
        c2 = c1[0] + t_size[0], c1[1] - t_size[1]
        cv2.rectangle(im, c1, c2, color, -1, cv2.LINE_AA)  # filled
        img_PIL = Image.fromarray(cv2.cvtColor(im, cv2.COLOR_BGR2RGB))
        draw = ImageDraw.Draw(img_PIL)
        draw.text((c1[0], c2[1] - 2), label, fill=(255, 255, 255), font=font)
        return cv2.cvtColor(np.array(img_PIL), cv2.COLOR_RGB2BGR)

第八步：detect.py文件修改

错误代码：Colors is not subscriptable

im0 = plot_one_box(xyxy, im0, label=label, color=colors(c, True), line_thickness=line_thickness)

4、python cv2 putText 显示中文的方法

参见博文：python cv2 putText 显示中文的方法

华为开发者空间

华为开发者空间，是为全球开发者打造的专属开发空间，汇聚了华为优质开发资源及工具，致力于让每一位开发者拥有一台云主机，基于华为根生态开发、创新。

更多推荐

解读TaurusDB字段压缩：减少存储成本，避免语句大量修改

华为开发者空间

使用DAS的导出和导入功能迁移GaussDB数据

华为开发者空间

GaussDB 使用gs_loader工具导入数据

华为开发者空间

所有评论(0)

查看更多评论

我的眼中只有学习

@weixin_45839604

已为社区贡献1条内容