YOLOv5报错大全
YOLOv5报错大全YOLOv5报错大全YOLOv5报错大全1、安装GPU训练环境1.1 显卡驱动报错:NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver1、安装GPU训练环境1.1 显卡驱动报错:NVIDIA-SMI has failed because it couldn’t communic
YOLOv5报错大全
YOLOv5报错大全
- YOLOv5报错大全
- 1、安装GPU训练环境
- 2、torch训练过程中
- 2.1 Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
- 2.2 运行detect.py直接GPU算力拉满后直接挂掉内核
- 2.3 Exception: train: ../datasets/coco/images/train/xxx does not exist
- 2.4 ValueError: not enough values to unpack (expected 3, got 0)
- 3、中文标签显示(YOLOv5 v5.0版本)
- 4、python cv2 putText 显示中文的方法
1、安装GPU训练环境
1.1 显卡驱动报错:NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver
# 查看预安装的nvidia版本,预安装可以在,>软件和更新>附加驱动 里安装
cd /usr/src
ls
sudo dkms install -m nvidia -v xxx(ls nvidia-后面那一堆)
sudo apt install nvidia-cuda-toolkit
reboot
nvcc --version
nvidia-smi
有如下结果即可表示nvidia的驱动没有任何问题了:
Mon Jul 19 09:01:53 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.119.04 Driver Version: 450.119.04 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 166... Off | 00000000:01:00.0 Off | N/A |
| N/A 39C P8 5W / N/A | 185MiB / 5944MiB | 3% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1028 G /usr/lib/xorg/Xorg 82MiB |
| 0 N/A N/A 1343 G /usr/bin/gnome-shell 100MiB |
+-----------------------------------------------------------------------------+
# 最后再在python里面检查torch是否切换为GPU版本(是否使用cuda)
import torch as t
t.cuda.is_available()
>>>True
注:若本来安装好了,nvidia-smi
还是显示:Command 'nvidia-sim' not found
这里再sudo apt-get install 就会显示:
Module nvidia/srv-450.142.00 already installed on kernel 5.11.0-25-generic/x86_64
检查驱动:nvidia-settings
正常打开
重新配置nvidia:sudo rmmod nvidia
rmmod: ERROR: Module nvidia is in use by: nvidia_uvm nvidia_modeset
再启动:sudo nvidia-smi
Mon Aug 9 10:28:21 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.142.00 Driver Version: 450.142.00 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 1650 Off | 00000000:01:00.0 On | N/A |
| 40% 38C P8 8W / 90W | 301MiB / 3902MiB | 36% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 886 G /usr/lib/xorg/Xorg 85MiB |
| 0 N/A N/A 1257 G /usr/bin/gnome-shell 128MiB |
| 0 N/A N/A 1673 G ...AAAAAAAAA= --shared-files 85MiB |
+-----------------------------------------------------------------------------+
2、torch训练过程中
2.1 Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
翻译:
错误原因:无法获取卷积算法。这可能是因为cuDNN初始化失败,所以请尝试查看上面是否打印了警告日志消息。
GPU内存占用太多,导致没有合理分配内存。需要使用tf.config()函数。
解决办法:
1 关闭其他正在running的的程序
2 查看任务管理器,查看GPU内存的使用量。
3 添加tf.config()
2.2 运行detect.py直接GPU算力拉满后直接挂掉内核
这个比上面还过分,至少我在踩这个坑很短时间内是没有反映过来的,有点惶恐甚至百度也没用。后来想想,kernel一声不响就挂掉了应该也是算力不太ok的,所以解决方案比较简单咯:
- 调整算力,换个3080ti
- 牺牲时间换算力(细水长流咯)
一般情况下选第二种:
# 将batch-size调整小,树的张成调小,为了达到相同的精度就需要更多次的回滚,所以就需要更多的时间
python3 train.py --img 640 --data ../datasets/coco/my.yaml --cfg ./models/yolov5s.yaml --weights ./weights/yolov5s.pt --batch-size 8 --epoch 60
2.3 Exception: train: …/datasets/coco/images/train/xxx does not exist
奇了个怪也,我刚刚用labelmg标记的,这么大一串data你咋跟我说不见了?
后来到处找了一下,才发现YOLOv5识别的数据集应该是这个样子才对:
而我的labelmg标记的结果是长这个样子:
后来才知道,YOLO本体是torch咯,不识别xml文件,所以要转换成.txt文件
代码如下:
import xml.etree.ElementTree as ET
import pickle
import os
from os import listdir, getcwd
from os.path import join
import random
from shutil import copyfile
# 根据自己的数据标签修改
classes=["0","1","2"]
def clear_hidden_files(path):
dir_list = os.listdir(path)
for i in dir_list:
abspath = os.path.join(os.path.abspath(path), i)
if os.path.isfile(abspath):
if i.startswith("._"):
os.remove(abspath)
else:
clear_hidden_files(abspath)
def convert(size, box):
dw = 1./size[0]
dh = 1./size[1]
x = (box[0] + box[1])/2.0
y = (box[2] + box[3])/2.0
w = box[1] - box[0]
h = box[3] - box[2]
x = x*dw
w = w*dw
y = y*dh
h = h*dh
return (x,y,w,h)
def convert_annotation(image_id):
in_file = open('../datasets/coco/labels/train/%s.xml' %image_id)
out_file = open('../datasets/coco/labels/tran_train/%s.txt' %image_id, 'w')
tree=ET.parse(in_file)
root = tree.getroot()
size = root.find('size')
w = int(size.find('width').text)
h = int(size.find('height').text)
for obj in root.iter('object'):
difficult = obj.find('difficult').text
cls = obj.find('name').text
if cls not in classes or int(difficult) == 1:
continue
cls_id = classes.index(cls)
xmlbox = obj.find('bndbox')
b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text), float(xmlbox.find('ymax').text))
bb = convert((w,h), b)
out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')
in_file.close()
out_file.close()
# 这里是根据我的文件格式读取文件名,调用convert_annotation方法
for file in os.listdir("../datasets/coco/labels/train/"):
convert_annotation(file[:-4])
2.4 ValueError: not enough values to unpack (expected 3, got 0)
本来是想要调整标签的,用叠代切片的方式替换了lable:
import os, glob
if __name__ == '__main__':
for file in os.listdir("../datasets/coco/labels/train/"):
txt_list = glob.glob("../datasets/coco/labels/train/"+file)
for txt_item in txt_list:
with open(txt_item) as f:
lines = f.readlines()
with open(txt_item, 'w') as f:
for line in lines:
line_split = line.strip().split()
if line_split[0] == '0':
line_split[0] = 'leaving'
elif line_split[0] == '1':
line_split[0] = 'working'
elif line_split[0] == '2':
line_split[0] = 'sleeping'
f.write(
line_split[0] + ' ' +
line_split[1] + ' ' +
line_split[2] + ' ' +
line_split[3] + ' ' +
line_split[4]+'\n')
pass
却发现事情越弄越糟糕:
ValueError: zero-size array to reduction operation maximum which has no identity
ValueError: could not convert string to float
ValueError: not enough values to unpack (expected 3, got 0)
因果关系大概是:
更替的标签文件(也就是".txt"文件)中label为字符串,无法转换为float进行矩阵计算,造成标签矩阵zero-size为空矩阵,无法进行maximum operation(也就是一系列变换操作),所以输入的nc=3,然而got=0;
解决方案:
直接连环爆炸,后来才知道,lable的标签确实是不能更改的,一定要是数字才能str -> float;
如果需要更改输出时显示标签的话(就是detect的时候输出的标签),你只需要更改my.yaml文件里的标签"names",按照输入时的顺序(我是推荐用[‘0’,‘1’,‘2‘]这在方式标记标签的,自解码也有顺序,然后根据标签时候的状态更改这里的训练"names"
3、中文标签显示(YOLOv5 v5.0版本)
基本操作同:YoloV5实现中文标签目标检测
但是5.0版本有一些细小的改动(参数重新命名),莫名其妙报了很多错误,像之前的box命名成img现在又改成im这些,头疼了一下午才慢慢摸索出来。
第六步:utils/plot.py文件修改
错误代码:name 'img' is not defined
def plot_one_box(x, im, color=(128, 128, 128), label=None, line_thickness=3):
# Plots one bounding box on image 'im' using OpenCV
assert im.data.contiguous, 'Image not contiguous. Apply np.ascontiguousarray(im) to plot_on_box() input image.'
tl = line_thickness or round(0.002 * (im.shape[0] + im.shape[1]) / 2) + 1 # line/font thickness
c1, c2 = (int(x[0]), int(x[1])), (int(x[2]), int(x[3]))
cv2.rectangle(im, c1, c2, color, thickness=tl, lineType=cv2.LINE_AA)
if label:
tf = max(tl - 1, 1) # font thickness
t_size = cv2.getTextSize(label, 0, fontScale=tl / 3, thickness=tf)[0]
font_size = t_size[1]
font = ImageFont.truetype('wryh.ttf', font_size)
t_size = font.getsize(label)
c2 = c1[0] + t_size[0], c1[1] - t_size[1]
cv2.rectangle(im, c1, c2, color, -1, cv2.LINE_AA) # filled
img_PIL = Image.fromarray(cv2.cvtColor(im, cv2.COLOR_BGR2RGB))
draw = ImageDraw.Draw(img_PIL)
draw.text((c1[0], c2[1] - 2), label, fill=(255, 255, 255), font=font)
return cv2.cvtColor(np.array(img_PIL), cv2.COLOR_RGB2BGR)
第八步:detect.py文件修改
错误代码:Colors is not subscriptable
im0 = plot_one_box(xyxy, im0, label=label, color=colors(c, True), line_thickness=line_thickness)
4、python cv2 putText 显示中文的方法
更多推荐
所有评论(0)