GPU运行报错的几种问题解决

报错一

Failed to get convolution algorithm. This is probably because cuDNN failed to initialize

情况一:内存不足

解决方式:在训练代码头加上以下代码

import tensorflow as tf
import numpy as np
import keras
config = tf.compat.v1.ConfigProto(allow_soft_placement=True)
 
config.gpu_options.per_process_gpu_memory_fraction = 0.3
tf.compat.v1.keras.backend.set_session(tf.compat.v1.Session(config=config))

情况二: Cudnn版本不匹配

一般如果试Cudnn版本不匹配的话会出现类似如下的错误:

Loaded runtime CuDNN library: 7.5.0 but source was compiled with: 7.6.0

若出现可以直接考虑情况二,若没出现大概率试情况一。这时运行下面代码

pip install cudnn==7.6.0

​ 当然此方法适用于你建立了虚拟环境的情况下,为了节省时间直接通过conda进行虚拟环境cudnn的配置。还是建议大家在自己的bashrc文件下配置好各版本的cuda和cudnn,之后不管在什么环境下都可以直接调用了。如下

# CUDA 10.2
#export PATH="/usr/local/cuda-10.2/bin:$PATH"
#export LD_LIBRARY_PATH="/usr/local/cuda-10.2/lib64:$LD_LIBRARY_PATH"
#export C_INCLUDE_PATH="/usr/local/cuda-10.2/include:$C_INCLUDE_PATH"
#export CPLUS_INCLUDE_PATH="/usr/local/cuda-10.2/include:$CPLUS_INCLUDE_PATH"

# CUDA 10.0
export PATH="/home/hpjc/local/cuda-10.0/bin:$PATH"
export LD_LIBRARY_PATH="/home/hpjc/local/cuda-10.0/lib64:$LD_LIBRARY_PATH"
export C_INCLUDE_PATH="/home/hpjc/local/cuda-10.0/include:$C_INCLUDE_PATH"
export CPLUS_INCLUDE_PATH="/home/hpjc/local/cuda-10.0/include:$CPLUS_INCLUDE_PATH"

# CUDA 9.0
#export PATH="/home/hpjc/local/cuda-9.0/bin:$PATH"
#export LD_LIBRARY_PATH="/home/hpjc/local/cuda-9.0/lib64:$LD_LIBRARY_PATH"
#export C_INCLUDE_PATH="/home/hpjc/local/cuda-9.0/include:$C_INCLUDE_PATH"
#export CPLUS_INCLUDE_PATH="/home/hpjc/local/cuda-9.0/include:$CPLUS_INCLUDE_PATH"

# CUDA 8.0
#export PATH="/home/hpjc/local/cuda-8.0/bin:$PATH"
#export DL_LIBRARY_PATH="/home/hpjc/local/cuda-8.0/lib64:$DL_LIBRARY_PATH"
#export C_INCLUDE_PATH="/home/hpjc/local/cuda-8.0/include:$C_INCLUDE_PATH"
#export CPLUS_INCLUDE_PATH="/home/hpjc/local/cuda-8.0/include:$CPLUS_INCLUDE_PATH"

报错二

tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.

​ 报错二大概率情况下会和以上两种报错一起出现,故通过报错一的解决方法即可解决。

报错三

CUPTI error: CUPTI could not be loaded or symbol could not be found.

​ 这个报错网上很多推荐通过降cuda或tensorflow的版本,其实不需要。我们进入以下目录

/local/cuda-10.0/extras/CUPTI/lib64 #这里的cuda-10.0用于举例,具体看你们配置的cuda版本

在此目录下一般会发现以下三个东西(不一定)

libcupti.so  libcupti.so.10.0  libcupti.so.10.0.130

然后进入此目录依次执行,复制到**local/cuda-10.0/lib64/**路径下即可用

cp libcupti.so local/cuda-10.0/lib64/
cp libcupti.so.10.0 local/cuda-10.0/lib64/
cp libcupti.so.10.0.130 local/cuda-10.0/lib64/
Logo

为开发者提供学习成长、分享交流、生态实践、资源工具等服务,帮助开发者快速成长。

更多推荐