OpenCL与OpenACC的安装
背景在文章编译安装LitmusRT遇到的问题中,我们已经编译安装了实时操作系统LitmusRT,并且能够正常启动它。现在,我们得编译安装一下GPU加速的第三方库OpenCL或OpenACC。这里再次注意不要用虚拟机安装英伟达驱动,因为虚拟机的显卡是虚拟出来的,加载不了英伟达的ko文件。所以我使用的是实验室的ubuntu16.04 64位台式机,此台式机已经装好了英伟达驱动、cuda10.2和10.
背景
在文章编译安装LitmusRT遇到的问题中,我们已经编译安装了实时操作系统LitmusRT,并且能够正常启动它。现在,我们得编译安装一下GPU加速的第三方库OpenCL或OpenACC。
这里再次注意不要用虚拟机安装英伟达驱动,因为虚拟机的显卡是虚拟出来的,加载不了英伟达的ko文件。所以我使用的是实验室的ubuntu16.04 64位台式机,此台式机已经装好了英伟达驱动、cuda10.2和10.1、gcc7、g++7。
OpenCL
如果有cuda就首推英伟达的,毕竟英伟达是GPU界内的龙头老大。但如果非得用虚拟机,我们就要换成英特尔版本的OpenCL来安装
Nvidia版本
1、下载安装包http://developer.download.nvidia.com/compute/cuda/4_0/sdk/gpucomputingsdk_4.0.17_linux.run
2、在运行run文件之前,确保自己的gcc和g++都是3.4版本的,如果不是,安装并切换
3、再把run文件拖到服务器上,运行之
root@sundata:/data/szc# ./gpucomputingsdk_4.0.17_linux.run
会让你指定安装目录和cuda目录,直接回车默认即可
3、然后把一些库文件复制到/usr/lib或/usr/local/include目录下
root@sundata:/data/szc# cp -r /usr/local/cuda-10.0/extras/CUPTI/include/* /usr/local/include
root@sundata:/data/szc# cp -r /snap/gnome-3-34-1804/36/usr/include/* /usr/local/include
root@sundata:/data/szc# cp /usr/local/cuda-10.0/lib64/libOpenCL.so.1.1 /usr/local/lib/libOpenCL.so
root@sundata:/data/szc# cp /usr/lib/x86_64-linux-gnu/libGLU.so.1.3.1 /usr/lib/libGLU.so
root@sundata:/data/szc# cp /snap/gnome-3-34-1804/60/usr/lib/x86_64-linux-gnu/libGL.so.1.0.0 /usr/lib/libGL.so
root@sundata:/data/szc# cp /snap/gnome-3-34-1804/60/usr/lib/x86_64-linux-gnu/libX11.so.6.3.0 /usr/lib/libX11.so
root@sundata:/data/szc# cp /usr/lib/x86_64-linux-gnu/libXmu.so.6.2.0 /usr/lib/libXmu.so
这些东西的名字及位置定位方法如下:
先切入到OpenCL的安装目录,然后make
root@sundata:~/NVIDIA_GPU_Computing_SDK/OpenCL# make
会报错找不到某某头文件或某某so库(cannot find -lxxx),类似这样
然后locate一下头文件或者xxx库,会得到此库的路径,然后头文件复制到/usr/loca/include目录下、库文件复制到/usr/local/lib下。
我这里glut.so找不到,因此我要安装这个依赖
root@sundata:~/NVIDIA_GPU_Computing_SDK/OpenCL# apt-get install freeglut3-dev
如果遇到这种报错:
ld cannot find crt1.o: No such file or directory
就要设置一下环境变量,后面的路径是locate crt1.o出来的结果
export LIBRARY_PATH=$LIBRARY_PATH:/snap/gnome-3-34-1804/36/usr/lib/x86_64-linux-gnu/
4、最后到OpenCL的安装目录,进行make即可
root@sundata:~/NVIDIA_GPU_Computing_SDK/OpenCL# make
会在bin/linux/release/目录下看到一堆可执行文件,执行一个
或者看一下能否检测到OpenCL的SDK
看到能检测到SDK和支持它的设备,就可以了
ps:在编译过程中,我还遇到了ibstdc++.so.6 error adding symbols: DSO missing from command line、Nonrepresentable section on output等问题,百般尝试后是通过降低gcc和g++版本到3.4后重新运行run文件安装解决的。实际上这些链接问题都是在我降gcc和g++之后遇到的,网上的方法怎么试怎么不行,common/common_opencl.mk文件怎么改也怎么不行,最后重新执行了run文件,问题竟然解决了。
5、如果我们要写自己的opencl文件,就得把cuda里的头文件全部复制到/usr/local/include目录下
root@sundata:/data/szc# cp -r /usr/local/cuda-10.0/include/* /usr/local/include/
然后编写示例代码
#include <stdio.h>
#include <stdlib.h>
#include <CL/cl.h>
// OpenCL source code
const char* OpenCLSource[] = {
"__kernel void VectorAdd(__global int* c, __global int* a,__global int* b)",
"{",
" // Index of the elements to add \n",
" unsigned int n = get_global_id(0);",
" // Sum the n’th element of vectors a and b and store in c \n",
" c[n] = a[n] + b[n];",
"}"
};
// Some interesting data for the vectors
int InitialData1[20] = {37,50,54,50,56,0,43,43,74,71,32,36,16,43,56,100,50,25,15,17};
int InitialData2[20] = {35,51,54,58,55,32,36,69,27,39,35,40,16,44,55,14,58,75,18,15};
// Number of elements in the vectors to be added
#define SIZE 2048
int main(int argc, char **argv) {
// Two integer source vectors in Host memory
int HostVector1[SIZE], HostVector2[SIZE];
// Initialize with some interesting repeating data
int c;
for(c = 0; c < SIZE; c++) {
HostVector1[c] = InitialData1[c%20];
HostVector2[c] = InitialData2[c%20];
}
//Get an OpenCL platform
cl_platform_id cpPlatform;
clGetPlatformIDs(1, &cpPlatform, NULL);
// Get a GPU device
cl_device_id cdDevice;
clGetDeviceIDs(cpPlatform, CL_DEVICE_TYPE_GPU, 1, &cdDevice, NULL);
// Create a context to run OpenCL on our CUDA-enabled NVIDIA GPU
cl_context GPUContext = clCreateContext(0, 1, &cdDevice, NULL, NULL, NULL);
// Create a command-queue on the GPU device
cl_command_queue cqCommandQueue = clCreateCommandQueue(GPUContext, cdDevice, 0, NULL);
// Allocate GPU memory for source vectors AND initialize from CPU memory
cl_mem GPUVector1 = clCreateBuffer(GPUContext, CL_MEM_READ_ONLY |
CL_MEM_COPY_HOST_PTR, sizeof(int) * SIZE, HostVector1, NULL);
cl_mem GPUVector2 = clCreateBuffer(GPUContext, CL_MEM_READ_ONLY |
CL_MEM_COPY_HOST_PTR, sizeof(int) * SIZE, HostVector2, NULL);
// Allocate output memory on GPU
cl_mem GPUOutputVector = clCreateBuffer(GPUContext, CL_MEM_WRITE_ONLY, sizeof(int) * SIZE, NULL, NULL);
// Create OpenCL program with source code
cl_program OpenCLProgram = clCreateProgramWithSource(GPUContext, 7, OpenCLSource, NULL, NULL);
// Build the program (OpenCL JIT compilation)
clBuildProgram(OpenCLProgram, 0, NULL, NULL, NULL, NULL);
// Create a handle to the compiled OpenCL function (Kernel)
cl_kernel OpenCLVectorAdd = clCreateKernel(OpenCLProgram, "VectorAdd", NULL);
// In the next step we associate the GPU memory with the Kernel arguments
clSetKernelArg(OpenCLVectorAdd, 0, sizeof(cl_mem),(void*)&GPUOutputVector);
clSetKernelArg(OpenCLVectorAdd, 1, sizeof(cl_mem), (void*)&GPUVector1);
clSetKernelArg(OpenCLVectorAdd, 2, sizeof(cl_mem), (void*)&GPUVector2);
// Launch the Kernel on the GPU
size_t WorkSize[1] = {SIZE}; // one dimensional Range
clEnqueueNDRangeKernel(cqCommandQueue, OpenCLVectorAdd, 1, NULL, WorkSize, NULL, 0, NULL, NULL);
// Copy the output in GPU memory back to CPU memory
int HostOutputVector[SIZE];
clEnqueueReadBuffer(cqCommandQueue, GPUOutputVector, CL_TRUE, 0,
SIZE * sizeof(int), HostOutputVector, 0, NULL, NULL);
// Cleanup
clReleaseKernel(OpenCLVectorAdd);
clReleaseProgram(OpenCLProgram);
clReleaseCommandQueue(cqCommandQueue);
clReleaseContext(GPUContext);
clReleaseMemObject(GPUVector1);
clReleaseMemObject(GPUVector2);
clReleaseMemObject(GPUOutputVector);
// Print out the results
int Rows;
for (Rows = 0; Rows < (SIZE/20); Rows++, printf("\t")) {
for(c = 0; c <20; c++) {
printf("%c",(char)HostOutputVector[Rows * 20 + c]);
}
}
printf("\n\nThe End\n\n");
return 0;
}
编译
root@sundata:/data/szc# gcc test_opencl.c -o test_opencl -lOpenCL
运行
ps:官网手册里创建上下文用的是
cl_context GPUContext = clCreateContextFromType(0, CL_DEVICE_TYPE_GPU, NULL, NULL, NULL);
但是此函数会运行时出错,可以通过把错误变量指针传入最后一个参数进行检验
....
cl_int ciErr1;
cl_context GPUContext = clCreateContextFromType(0, CL_DEVICE_TYPE_GPU, NULL, NULL, &ciErr1);
if (ciErr1 != CL_SUCCESS) {
printf("Error in clCreateContext, error: %d\n", ciErr1);
return -1;
}
运行结果如下
所以就得用
cl_context GPUContext = clCreateContext(0, 1, &cdDevice, NULL, NULL, NULL);
来创建上下文
英特尔版本
1、下载依赖
(base) root@ubuntu:/home/szc# apt-get install clinfo
(base) root@ubuntu:/home/szc# apt install dkms xz-utils openssl libnuma1 libpciaccess0 bc curl libssl-dev lsb-core libicu-dev
(base) root@ubuntu:/home/szc# echo "deb http://download.mono-project.com/repo/debian wheezy main" | sudo tee /etc/apt/sources.list.d/mono-xamarin.list
(base) root@ubuntu:/home/szc# apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 3FA7E0328081BFF6A14DA29AA6A19B38D3D831EF
(base) root@ubuntu:/home/szc# apt-get update
(base) root@ubuntu:/home/szc# apt-get install mono-complete
2、下载英特尔opensl sdk源码http://registrationcenter-download.intel.com/akdlm/irc_nas/vcp/16284/intel_sdk_for_opencl_applications_2020.0.270.tar.gz,上传到ubuntu,解压之,并进入其目录
(base) root@ubuntu:/home/szc# tar -zxvf intel_sdk_for_opencl_applications_2020.0.270.tar.gz
(base) root@ubuntu:/home/szc# cd intel_sdk_for_opencl_applications_2020.0.270
3、然后执行安装脚本即可
(base) root@ubuntu:/home/szc/intel_sdk_for_opencl_applications_2020.0.270# ./install.sh
4、一路默认,完成后检查是否安装完成,看到 Intel(R) CPU Runtime for OpenCL(TM) Applications就说明可以了
(base) root@ubuntu:/home/szc/intel_sdk_for_opencl_applications_2020.0.270# clinfo
Number of platforms 1
Platform Name Intel(R) CPU Runtime for OpenCL(TM) Applications
Platform Vendor Intel(R) Corporation
Platform Version OpenCL 2.1 LINUX
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_fp64 cl_khr_image2d_from_buffer cl_intel_vec_len_hint
Platform Host timer resolution 1ns
Platform Extensions function suffix INTEL
Platform Name Intel(R) CPU Runtime for OpenCL(TM) Applications
Number of devices 1
Device Name Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
Device Vendor Intel(R) Corporation
Device Vendor ID 0x8086
Device Version OpenCL 2.1 (Build 0)
Driver Version 18.1.0.0920
Device OpenCL C Version OpenCL C 2.0
Device Type CPU
Device Profile FULL_PROFILE
Max compute units 4
Max clock frequency 2200MHz
Device Partition (core)
Max number of sub-devices 4
Supported partition types by counts, equally, by names (Intel)
Max work item dimensions 3
Max work item sizes 8192x8192x8192
Max work group size 8192
Preferred work group size multiple 128
Max sub-groups per work group 1
Preferred / native vector sizes
char 1 / 32
short 1 / 16
int 1 / 8
long 1 / 4
half 0 / 0 (n/a)
float 1 / 8
double 1 / 4 (cl_khr_fp64)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Address bits 64, Little-Endian
Global memory size 6233903104 (5.806GiB)
Error Correction support No
Max memory allocation 1558475776 (1.451GiB)
Unified memory for Host and Device Yes
Shared Virtual Memory (SVM) capabilities (core)
Coarse-grained buffer sharing Yes
Fine-grained buffer sharing Yes
Fine-grained system sharing Yes
Atomics Yes
Minimum alignment for any data type 128 bytes
Alignment of base address 1024 bits (128 bytes)
Preferred alignment for atomics
SVM 64 bytes
Global 64 bytes
Local 0 bytes
Max size for global variable 65536 (64KiB)
Preferred total size of global vars 65536 (64KiB)
Global Memory cache type Read/Write
Global Memory cache size 262144
Global Memory cache line 64 bytes
Image support Yes
Max number of samplers per kernel 480
Max size for 1D images from buffer 97404736 pixels
Max 1D or 2D image array size 2048 images
Base address alignment for 2D image buffers 64 bytes
Pitch alignment for 2D image buffers 64 bytes
Max 2D image size 16384x16384 pixels
Max 3D image size 2048x2048x2048 pixels
Max number of read image args 480
Max number of write image args 480
Max number of read/write image args 480
Max number of pipe args 16
Max active pipe reservations 65535
Max pipe packet size 1024
Local memory type Global
Local memory size 32768 (32KiB)
Max constant buffer size 131072 (128KiB)
Max number of constant args 480
Max size of kernel argument 3840 (3.75KiB)
Queue properties (on host)
Out-of-order execution Yes
Profiling Yes
Local thread execution (Intel) Yes
Queue properties (on device)
Out-of-order execution Yes
Profiling Yes
Preferred size 4294967295 (4GiB)
Max size 4294967295 (4GiB)
Max queues on device 4294967295
Max events on device 4294967295
Prefer user sync for interop No
Profiling timer resolution 1ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels Yes
Sub-group independent forward progress No
IL version SPIR-V_1.0
SPIR versions 1.2
printf() buffer size 1048576 (1024KiB)
Built-in kernels
Device Available Yes
Compiler Available Yes
Linker Available Yes
Device Extensions cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_fp64 cl_khr_image2d_from_buffer cl_intel_vec_len_hint
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) No platform
clCreateContext(NULL, ...) [default] No platform
clCreateContext(NULL, ...) [other] Success [INTEL]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) No platform
OpenACC
环境:ubuntu服务器,装有英伟达驱动和cuda,插着网线
1、下载压缩包并解压
root@sundata:/data/szc# wget https://developer.download.nvidia.com/hpc-sdk/20.7/nvhpc_2020_207_Linux_x86_64_cuda_multi.tar.gz
root@sundata:/data/szc# tar xpzf nvhpc_2020_207_Linux_x86_64_cuda_multi.tar.gz
2、安装
root@sundata:/data/szc# nvhpc_2020_207_Linux_x86_64_cuda_multi/install
安装时会要求配置参数,选择单系统和自己的安装路径即可
3、测试
先设置下环境变量,把/root/NVIDIA_GPU_Computing_SDK/hpc_sdk改成自己的安装路径
root@sundata:/data/szc# export PATH=/root/NVIDIA_GPU_Computing_SDK/hpc_sdk/Linux_x86_64/2020/compilers/bin/:$PATH
然后切换到安装路径下某个测试样例目录,编译即可运行
root@sundata:/data/szc# cd ~/NVIDIA_GPU_Computing_SDK/hpc_sdk/Linux_x86_64/2020/examples/OpenMP
root@sundata:~/NVIDIA_GPU_Computing_SDK/hpc_sdk/Linux_x86_64/2020/examples/OpenMP# make NTHREADS=4 matmul_test
运行截图
最后附上官方手册:https://docs.nvidia.com/hpc-sdk/archive/20.7/index.html
结语
这些文件都不小,从官网下载比较慢的话,可以从我的百度网盘上下载:
英特尔OpenCL:链接:https://pan.baidu.com/s/1a9_H5tbsfFjdMmPFlJbUzg ,提取码:060s
英伟达OpenCL:链接:https://pan.baidu.com/s/1J_qrL-PREONvIYnz1F7DoQ ,提取码:c2ci
英伟达OpenACC:链接:https://pan.baidu.com/s/1hQKKtrq4c6TEfXMuE_RC5w ,提取码:9u7o
更多推荐
所有评论(0)