Windows RuntimeError: Distributed package doesn‘t have NCCL built in问题
问题描述:python在windows环境下dist.init_process_group(backend, rank, world_size)处报错‘RuntimeError: Distributed package doesn’t have NCCL built in’,具体信息如下:File "D:\Software\Anaconda\Anaconda3\envs\segmenter\lib
·
问题描述:
python在windows环境下dist.init_process_group(backend, rank, world_size)处报错‘RuntimeError: Distributed package doesn’t have NCCL built in’,具体信息如下:
File "D:\Software\Anaconda\Anaconda3\envs\segmenter\lib\site-packages\torch\distributed\distributed_c10d.py", line 531, in init_process_group
timeout=timeout)
File "D:\Software\Anaconda\Anaconda3\envs\segmenter\lib\site-packages\torch\distributed\distributed_c10d.py", line 625, in _new_process_group_helper
raise RuntimeError("Distributed package doesn't have NCCL "
RuntimeError: Distributed package doesn't have NCCL built in
原因分析:
windows不支持NCCL backend
解决方案:
在dist.init_process_group语句之前添加backend=‘gloo’,也就是在windows中使用GLOO替代NCCL。
更多推荐
已为社区贡献2条内容
所有评论(0)