yolov5问题解决记录贴

随手记，以后有时间再整理吧问题一：环境配置后cuda与pytorch不符detect: weights=yolov5s.pt, source=data/images, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_conf=

致力于变成碰不到BUG的幸运儿

21975人浏览 · 2021-11-12 11:17:14

致力于变成碰不到BUG的幸运儿 · 2021-11-12 11:17:14 发布

随手记，以后有时间再整理吧

问题一：环境配置后cuda与pytorch不符

detect: weights=yolov5s.pt, source=data/images, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs/detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False
requirements: opencv-python>=4.1.2 not found and is required by YOLOv5, attempting auto-update...
requirements: 'pip install opencv-python>=4.1.2' skipped (offline)
/home/milk/anaconda3/envs/milk/lib/python3.8/site-packages/torch/cuda/__init__.py:143: UserWarning: 
NVIDIA GeForce RTX 3060 Laptop GPU with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3060 Laptop GPU GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
YOLOv5 🚀 2021-11-11 torch 1.10.0+cu102 CUDA:0 (NVIDIA GeForce RTX 3060 Laptop GPU, 5938MiB)

Downloading https://github.com/ultralytics/yolov5/releases/download/v6.0/yolov5s.pt to yolov5s.pt...
100%|██████████| 14.0M/14.0M [00:23<00:00, 613kB/s]

Traceback (most recent call last):
  File "/home/milk/yolo/yolov5/detect.py", line 244, in <module>
    main(opt)
  File "/home/milk/yolo/yolov5/detect.py", line 239, in main
    run(**vars(opt))
  File "/home/milk/anaconda3/envs/milk/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/home/milk/yolo/yolov5/detect.py", line 79, in run
    model = DetectMultiBackend(weights, device=device, dnn=dnn)
  File "/home/milk/yolo/yolov5/models/common.py", line 305, in __init__
    model = torch.jit.load(w) if 'torchscript' in w else attempt_load(weights, map_location=device)
  File "/home/milk/yolo/yolov5/models/experimental.py", line 98, in attempt_load
    model.append(ckpt['ema' if ckpt.get('ema') else 'model'].float().fuse().eval())  # FP32 model
  File "/home/milk/anaconda3/envs/milk/lib/python3.8/site-packages/torch/nn/modules/module.py", line 735, in float
    return self._apply(lambda t: t.float() if t.is_floating_point() else t)
  File "/home/milk/yolo/yolov5/models/yolo.py", line 240, in _apply
    self = super()._apply(fn)
  File "/home/milk/anaconda3/envs/milk/lib/python3.8/site-packages/torch/nn/modules/module.py", line 570, in _apply
    module._apply(fn)
  File "/home/milk/anaconda3/envs/milk/lib/python3.8/site-packages/torch/nn/modules/module.py", line 570, in _apply
    module._apply(fn)
  File "/home/milk/anaconda3/envs/milk/lib/python3.8/site-packages/torch/nn/modules/module.py", line 570, in _apply
    module._apply(fn)
  File "/home/milk/anaconda3/envs/milk/lib/python3.8/site-packages/torch/nn/modules/module.py", line 593, in _apply
    param_applied = fn(param)
  File "/home/milk/anaconda3/envs/milk/lib/python3.8/site-packages/torch/nn/modules/module.py", line 735, in <lambda>
    return self._apply(lambda t: t.float() if t.is_floating_point() else t)
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Process finished with exit code 1

解决方法一：

在该环境下重新安装torch，以下命令从pytorch官网下载----Start Locally | PyTorch

成功解决！

yolov5开始训练记录

/home/milk/anaconda3/envs/milk/bin/python3 /home/milk/yolo/yolov5/train.py
train: weights=yolov5s.pt, cfg=, data=data/Underwater.yaml, hyp=data/hyps/hyp.scratch.yaml, epochs=100, batch_size=16, imgsz=1280, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, evolve=None, bucket=, cache=None, image_weights=False, device=, multi_scale=False, single_cls=False, adam=False, sync_bn=False, workers=8, project=runs/train, name=exp, exist_ok=False, quad=False, linear_lr=False, label_smoothing=0.0, patience=100, freeze=0, save_period=-1, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest
github: skipping check (not a git repository), for updates see https://github.com/ultralytics/yolov5
YOLOv5 🚀 2021-11-11 torch 1.10.0+cu113 CUDA:0 (NVIDIA GeForce RTX 3060 Laptop GPU, 5938MiB)

hyperparameters: lr0=0.01, lrf=0.1, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0
TensorBoard: Start with 'tensorboard --logdir runs/train', view at http://localhost:6006/
Weights & Biases: run 'pip install wandb' to automatically track and visualize YOLOv5 🚀 runs (RECOMMENDED)
Overriding model.yaml nc=80 with nc=4

                 from  n    params  module                                  arguments                     
  0                -1  1      3520  models.common.Conv                      [3, 32, 6, 2, 2]              
  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  2                -1  1     18816  models.common.C3                        [64, 64, 1]                   
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  4                -1  2    115712  models.common.C3                        [128, 128, 2]                 
  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  6                -1  3    625152  models.common.C3                        [256, 256, 3]                 
  7                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]              
  8                -1  1   1182720  models.common.C3                        [512, 512, 1]                 
  9                -1  1    656896  models.common.SPPF                      [512, 512, 5]                 
 10                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 13                -1  1    361984  models.common.C3                        [512, 256, 1, False]          
 14                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 17                -1  1     90880  models.common.C3                        [256, 128, 1, False]          
 18                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 20                -1  1    296448  models.common.C3                        [256, 256, 1, False]          
 21                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 23                -1  1   1182720  models.common.C3                        [512, 512, 1, False]          
 24      [17, 20, 23]  1     24273  models.yolo.Detect                      [4, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
Model Summary: 270 layers, 7030417 parameters, 7030417 gradients

Transferred 343/349 items from yolov5s.pt
Scaled weight_decay = 0.0005
optimizer: SGD with parameter groups 57 weight, 60 weight (no decay), 60 bias
Traceback (most recent call last):
  File "/home/milk/yolo/yolov5/utils/datasets.py", line 406, in __init__
    raise Exception(f'{prefix}{p} does not exist')
Exception: train: /home/milk/yolo/yolov5/data/tain.txt does not exist

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/milk/yolo/yolov5/train.py", line 625, in <module>
    main(opt)
  File "/home/milk/yolo/yolov5/train.py", line 522, in main
    train(opt.hyp, opt, device, callbacks)
  File "/home/milk/yolo/yolov5/train.py", line 212, in train
    train_loader, dataset = create_dataloader(train_path, imgsz, batch_size // WORLD_SIZE, gs, single_cls,
  File "/home/milk/yolo/yolov5/utils/datasets.py", line 98, in create_dataloader
    dataset = LoadImagesAndLabels(path, imgsz, batch_size,
  File "/home/milk/yolo/yolov5/utils/datasets.py", line 411, in __init__
    raise Exception(f'{prefix}Error loading data from {path}: {e}\nSee {HELP_URL}')
Exception: train: Error loading data from ['/home/milk/yolo/yolov5/data/tain.txt']: train: /home/milk/yolo/yolov5/data/tain.txt does not exist
See https://github.com/ultralytics/yolov5/wiki/Train-Custom-Data

Process finished with exit code 1

修改yolov5s.yaml 中的nc=4

/home/milk/anaconda3/envs/milk/bin/python3 /home/milk/yolo/yolov5/train.py
train: weights=yolov5s.pt, cfg=, data=data/Underwater.yaml, hyp=data/hyps/hyp.scratch.yaml, epochs=100, batch_size=16, imgsz=1280, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, evolve=None, bucket=, cache=None, image_weights=False, device=, multi_scale=False, single_cls=False, adam=False, sync_bn=False, workers=8, project=runs/train, name=exp, exist_ok=False, quad=False, linear_lr=False, label_smoothing=0.0, patience=100, freeze=0, save_period=-1, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest
github: skipping check (not a git repository), for updates see https://github.com/ultralytics/yolov5
YOLOv5 🚀 2021-11-11 torch 1.10.0+cu113 CUDA:0 (NVIDIA GeForce RTX 3060 Laptop GPU, 5938MiB)

hyperparameters: lr0=0.01, lrf=0.1, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0
TensorBoard: Start with 'tensorboard --logdir runs/train', view at http://localhost:6006/
Weights & Biases: run 'pip install wandb' to automatically track and visualize YOLOv5 🚀 runs (RECOMMENDED)
Overriding model.yaml nc=80 with nc=4

                 from  n    params  module                                  arguments                     
  0                -1  1      3520  models.common.Conv                      [3, 32, 6, 2, 2]              
  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  2                -1  1     18816  models.common.C3                        [64, 64, 1]                   
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  4                -1  2    115712  models.common.C3                        [128, 128, 2]                 
  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  6                -1  3    625152  models.common.C3                        [256, 256, 3]                 
  7                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]              
  8                -1  1   1182720  models.common.C3                        [512, 512, 1]                 
  9                -1  1    656896  models.common.SPPF                      [512, 512, 5]                 
 10                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 13                -1  1    361984  models.common.C3                        [512, 256, 1, False]          
 14                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 17                -1  1     90880  models.common.C3                        [256, 128, 1, False]          
 18                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 20                -1  1    296448  models.common.C3                        [256, 256, 1, False]          
 21                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 23                -1  1   1182720  models.common.C3                        [512, 512, 1, False]          
 24      [17, 20, 23]  1     24273  models.yolo.Detect                      [4, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
Model Summary: 270 layers, 7030417 parameters, 7030417 gradients

Transferred 343/349 items from yolov5s.pt
Scaled weight_decay = 0.0005
optimizer: SGD with parameter groups 57 weight, 60 weight (no decay), 60 bias
Traceback (most recent call last):
  File "/home/milk/yolo/yolov5/utils/datasets.py", line 406, in __init__
    raise Exception(f'{prefix}{p} does not exist')
Exception: train: /home/milk/yolo/yolov5/data/tain.txt does not exist

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/milk/yolo/yolov5/train.py", line 625, in <module>
    main(opt)
  File "/home/milk/yolo/yolov5/train.py", line 522, in main
    train(opt.hyp, opt, device, callbacks)
  File "/home/milk/yolo/yolov5/train.py", line 212, in train
    train_loader, dataset = create_dataloader(train_path, imgsz, batch_size // WORLD_SIZE, gs, single_cls,
  File "/home/milk/yolo/yolov5/utils/datasets.py", line 98, in create_dataloader
    dataset = LoadImagesAndLabels(path, imgsz, batch_size,
  File "/home/milk/yolo/yolov5/utils/datasets.py", line 411, in __init__
    raise Exception(f'{prefix}Error loading data from {path}: {e}\nSee {HELP_URL}')
Exception: train: Error loading data from ['/home/milk/yolo/yolov5/data/tain.txt']: train: /home/milk/yolo/yolov5/data/tain.txt does not exist
See https://github.com/ultralytics/yolov5/wiki/Train-Custom-Data

Process finished with exit code 1

训练出错，修改

parser.add_argument('--cfg', type=str, default=ROOT / 'models/yolov5s.yaml', help='model.yaml path')

详细检查.yaml文件，发现路径打错一个字母，训练开始，预估10h

还有一种情况需要注意 YOLOv5中的datasets文件中使用的数据集文件名是images 如果你的文件名不是这个，训练时会出现找不到标签错误

解决：

修改 utils 下的 datasets.py 文件中下列位置的 ‘Images’ 处为你存放图片文件夹名称，我得是Images


def img2label_paths(img_paths):
    # Define label paths as a function of image paths
    sa, sb = os.sep + 'Images' + os.sep, os.sep + 'labels' + os.sep  # /images/, /labels/ substrings
    return [sb.join(x.rsplit(sa, 1)).rsplit('.', 1)[0] + '.txt' for x in img_paths]