CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some

在修改代码时，出现入下报错。分析：本来以为是GPU卡bug了，但百度了解到这类问题很有可能是数据引起的。例如数据只有五个类别，却要求分六个类别。我正在做的是图像分割任务，所以很可能是分割类别出了问题。产生报错之前我对数据进行了resize()操作，然后我尝试把他替换成centercrop()，发现报错消失了！随后便去查阅了官方文档对于resize()的解释：其中关于插值的操作引起了我的注意，再细看

小馆长布鲁克

16396人浏览 · 2022-07-10 15:28:56

小馆长布鲁克 · 2022-07-10 15:28:56 发布

问题描述：

在修改代码时，出现入下报错。

发生异常: RuntimeError
CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

分析：

本来以为是GPU卡bug了，但百度了解到这类问题很有可能是数据引起的。

例如数据只有五个类别，却要求分六个类别。

我正在做的是图像分割任务，所以很可能是分割类别出了问题。

产生报错之前我对数据进行了resize()操作，然后我尝试把他替换成centercrop()，发现报错消失了！随后便去查阅了官方文档对于resize()的解释：

torchvision.transforms.functional.resize(
                    img: torch.Tensor, 
                    size: List[int], 
                    interpolation: torchvision.transforms.functional.InterpolationMode = <InterpolationMode.BILINEAR: 'bilinear'>, 
                    max_size: Optional[int] = None, 
                    antialias: Optional[bool] = None) → torch.Tensor

其中关于插值的操作引起了我的注意，再细看：

interpolation (InterpolationMode) – Desired interpolation enum defined by
 torchvision.transforms.InterpolationMode. Default is InterpolationMode.BILINEAR. 
If input is Tensor, only InterpolationMode.NEAREST, InterpolationMode.BILINEAR and 
InterpolationMode.BICUBIC are supported. For backward compatibility integer values (e.g. 
PIL.Image[.Resampling].NEAREST) are still accepted, but deprecated since 0.13 and will be
 removed in 0.15. Please use InterpolationMode enum.

默认进行的是双线性插值，这种插值是选择临近的四个像素值，计算出新的插入值，如图：

于是焕然大悟，我的Groudtruth里面本应该只有0，1分别代表前景背景，但进行了双线性插值，会产生除0，1以为的点，网络无法识别才会报错。

解决方法：

将resize()中关于插值的参数改为临近值插值法，即选择最近的一个点的像素值进行插值，这样不会产生新的像素值。

resize(target, (self.size), interpolation = InterpolationMode.NEAREST)

如此便可。

华为云开发者联盟

为开发者提供学习成长、分享交流、生态实践、资源工具等服务，帮助开发者快速成长。

更多推荐

Spring开发：动态代理的艺术与实践

华为云开发者联盟

Python中两种网络编程方式：Socket和HTTP协议

华为云开发者联盟

教你解决CCE集群中容器出网

华为云开发者联盟

所有评论(0)

查看更多评论

小馆长布鲁克

@weixin_45686244

已为社区贡献2条内容