YOLOv5报错大全

在这里插入图片描述

1、安装GPU训练环境

1.1 显卡驱动报错:NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver

# 查看预安装的nvidia版本,预安装可以在,>软件和更新>附加驱动 里安装
cd /usr/src
ls
sudo dkms install -m nvidia -v xxx(ls nvidia-后面那一堆)
sudo apt install nvidia-cuda-toolkit
reboot
nvcc --version
nvidia-smi

有如下结果即可表示nvidia的驱动没有任何问题了:

Mon Jul 19 09:01:53 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.119.04   Driver Version: 450.119.04   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 166...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   39C    P8     5W /  N/A |    185MiB /  5944MiB |      3%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1028      G   /usr/lib/xorg/Xorg                 82MiB |
|    0   N/A  N/A      1343      G   /usr/bin/gnome-shell              100MiB |
+-----------------------------------------------------------------------------+
# 最后再在python里面检查torch是否切换为GPU版本(是否使用cuda)
import torch as t
t.cuda.is_available()
>>>True

注:若本来安装好了,nvidia-smi还是显示:Command 'nvidia-sim' not found

这里再sudo apt-get install 就会显示:
Module nvidia/srv-450.142.00 already installed on kernel 5.11.0-25-generic/x86_64

检查驱动:nvidia-settings正常打开

重新配置nvidia:sudo rmmod nvidia

rmmod: ERROR: Module nvidia is in use by: nvidia_uvm nvidia_modeset

再启动:sudo nvidia-smi

Mon Aug  9 10:28:21 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.142.00   Driver Version: 450.142.00   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 1650    Off  | 00000000:01:00.0  On |                  N/A |
| 40%   38C    P8     8W /  90W |    301MiB /  3902MiB |     36%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A       886      G   /usr/lib/xorg/Xorg                 85MiB |
|    0   N/A  N/A      1257      G   /usr/bin/gnome-shell              128MiB |
|    0   N/A  N/A      1673      G   ...AAAAAAAAA= --shared-files       85MiB |
+-----------------------------------------------------------------------------+

2、torch训练过程中

2.1 Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.

翻译:
错误原因:无法获取卷积算法。这可能是因为cuDNN初始化失败,所以请尝试查看上面是否打印了警告日志消息。
GPU内存占用太多,导致没有合理分配内存。需要使用tf.config()函数。

解决办法:
1 关闭其他正在running的的程序
2 查看任务管理器,查看GPU内存的使用量。
3 添加tf.config()
在这里插入图片描述

2.2 运行detect.py直接GPU算力拉满后直接挂掉内核

这个比上面还过分,至少我在踩这个坑很短时间内是没有反映过来的,有点惶恐甚至百度也没用。后来想想,kernel一声不响就挂掉了应该也是算力不太ok的,所以解决方案比较简单咯:

  1. 调整算力,换个3080ti
  2. 牺牲时间换算力(细水长流咯)

一般情况下选第二种:

# 将batch-size调整小,树的张成调小,为了达到相同的精度就需要更多次的回滚,所以就需要更多的时间
python3 train.py --img 640 --data ../datasets/coco/my.yaml --cfg ./models/yolov5s.yaml --weights ./weights/yolov5s.pt --batch-size 8 --epoch 60

2.3 Exception: train: …/datasets/coco/images/train/xxx does not exist

奇了个怪也,我刚刚用labelmg标记的,这么大一串data你咋跟我说不见了?

后来到处找了一下,才发现YOLOv5识别的数据集应该是这个样子才对:

而我的labelmg标记的结果是长这个样子:
在这里插入图片描述
后来才知道,YOLO本体是torch咯,不识别xml文件,所以要转换成.txt文件
代码如下:

import xml.etree.ElementTree as ET
import pickle
import os
from os import listdir, getcwd
from os.path import join
import random
from shutil import copyfile

# 根据自己的数据标签修改
classes=["0","1","2"]


def clear_hidden_files(path):
    dir_list = os.listdir(path)
    for i in dir_list:
        abspath = os.path.join(os.path.abspath(path), i)
        if os.path.isfile(abspath):
            if i.startswith("._"):
                os.remove(abspath)
        else:
            clear_hidden_files(abspath)

def convert(size, box):
    dw = 1./size[0]
    dh = 1./size[1]
    x = (box[0] + box[1])/2.0
    y = (box[2] + box[3])/2.0
    w = box[1] - box[0]
    h = box[3] - box[2]
    x = x*dw
    w = w*dw
    y = y*dh
    h = h*dh
    return (x,y,w,h)

def convert_annotation(image_id):
    in_file = open('../datasets/coco/labels/train/%s.xml' %image_id)
    out_file = open('../datasets/coco/labels/tran_train/%s.txt' %image_id, 'w')
    tree=ET.parse(in_file)
    root = tree.getroot()
    size = root.find('size')
    w = int(size.find('width').text)
    h = int(size.find('height').text)

    for obj in root.iter('object'):
        difficult = obj.find('difficult').text
        cls = obj.find('name').text
        if cls not in classes or int(difficult) == 1:
            continue
        cls_id = classes.index(cls)
        xmlbox = obj.find('bndbox')
        b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text), float(xmlbox.find('ymax').text))
        bb = convert((w,h), b)
        out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')
    in_file.close()
    out_file.close()

# 这里是根据我的文件格式读取文件名,调用convert_annotation方法
for file in os.listdir("../datasets/coco/labels/train/"):
    convert_annotation(file[:-4])  

2.4 ValueError: not enough values to unpack (expected 3, got 0)

本来是想要调整标签的,用叠代切片的方式替换了lable:

import os, glob

if __name__ == '__main__':
    for file in os.listdir("../datasets/coco/labels/train/"):
        txt_list = glob.glob("../datasets/coco/labels/train/"+file)
        for txt_item in txt_list:
            with open(txt_item) as f:
                lines = f.readlines()
            with open(txt_item, 'w') as f:
                for line in lines:
                    line_split = line.strip().split()
                    if line_split[0] == '0':
                        line_split[0] = 'leaving'
                    elif line_split[0] == '1':
                        line_split[0] = 'working'
                    elif line_split[0] == '2':
                        line_split[0] = 'sleeping'
                    f.write(
                        line_split[0] + ' ' +
                        line_split[1] + ' ' +
                        line_split[2] + ' ' +
                        line_split[3] + ' ' +
                        line_split[4]+'\n')
            pass

却发现事情越弄越糟糕:

ValueError: zero-size array to reduction operation maximum which has no identity
ValueError: could not convert string to float
ValueError: not enough values to unpack (expected 3, got 0)

因果关系大概是:

更替的标签文件(也就是".txt"文件)中label为字符串,无法转换为float进行矩阵计算,造成标签矩阵zero-size为空矩阵,无法进行maximum operation(也就是一系列变换操作),所以输入的nc=3,然而got=0;

解决方案:

直接连环爆炸,后来才知道,lable的标签确实是不能更改的,一定要是数字才能str -> float;

如果需要更改输出时显示标签的话(就是detect的时候输出的标签),你只需要更改my.yaml文件里的标签"names",按照输入时的顺序(我是推荐用[‘0’,‘1’,‘2‘]这在方式标记标签的,自解码也有顺序,然后根据标签时候的状态更改这里的训练"names"

3、中文标签显示(YOLOv5 v5.0版本)

基本操作同:YoloV5实现中文标签目标检测

但是5.0版本有一些细小的改动(参数重新命名),莫名其妙报了很多错误,像之前的box命名成img现在又改成im这些,头疼了一下午才慢慢摸索出来。

第六步:utils/plot.py文件修改

错误代码:name 'img' is not defined
def plot_one_box(x, im, color=(128, 128, 128), label=None, line_thickness=3):
    # Plots one bounding box on image 'im' using OpenCV
    assert im.data.contiguous, 'Image not contiguous. Apply np.ascontiguousarray(im) to plot_on_box() input image.'
    tl = line_thickness or round(0.002 * (im.shape[0] + im.shape[1]) / 2) + 1  # line/font thickness
    c1, c2 = (int(x[0]), int(x[1])), (int(x[2]), int(x[3]))
    cv2.rectangle(im, c1, c2, color, thickness=tl, lineType=cv2.LINE_AA)
    if label:
        tf = max(tl - 1, 1)  # font thickness
        t_size = cv2.getTextSize(label, 0, fontScale=tl / 3, thickness=tf)[0]
        font_size = t_size[1]
        font = ImageFont.truetype('wryh.ttf', font_size)
        t_size = font.getsize(label)
        c2 = c1[0] + t_size[0], c1[1] - t_size[1]
        cv2.rectangle(im, c1, c2, color, -1, cv2.LINE_AA)  # filled
        img_PIL = Image.fromarray(cv2.cvtColor(im, cv2.COLOR_BGR2RGB))
        draw = ImageDraw.Draw(img_PIL)
        draw.text((c1[0], c2[1] - 2), label, fill=(255, 255, 255), font=font)
        return cv2.cvtColor(np.array(img_PIL), cv2.COLOR_RGB2BGR)

第八步:detect.py文件修改

错误代码:Colors is not subscriptable
im0 = plot_one_box(xyxy, im0, label=label, color=colors(c, True), line_thickness=line_thickness)

4、python cv2 putText 显示中文的方法

参见博文:python cv2 putText 显示中文的方法

Logo

华为开发者空间,是为全球开发者打造的专属开发空间,汇聚了华为优质开发资源及工具,致力于让每一位开发者拥有一台云主机,基于华为根生态开发、创新。

更多推荐