# NVIDIA CUDA 安装 NVIDIA driver CUDA Driver cuDNN ## Nvidia 配置 ### 安装驱动 https://www.nvidia.cn/Download/index.aspx?lang=cn wget https://cn.download.nvidia.cn/XFree86/Linux-x86_64/530.41.03/NVIDIA-Linux-x86_64-530.41.03.run ```bash # 0. 依赖 apt install gcc g++ make cmake -y # 1. 打开文件blacklist.conf sudo gedit /etc/modprobe.d/blacklist-nouveau.conf # 1. 向文件中写入以下两行内容 blacklist nouveau options nouveau modeset=0 # 3. 更新系统(能更新就更新) 重新生成 kernel initramfs sudo update-initramfs -u # 4. 重启电脑 reboot 5. 验证是否禁用成功。若无任何输出,则禁用成功,如果有行输出则未禁用。 lsmod | grep nouveau # 若输出 nouveau 1949696 0 mxm_wmi 16384 1 nouveau i2c_algo_bit 16384 1 nouveau ttm 106496 1 nouveau drm_kms_helper 184320 1 nouveau drm 495616 3 drm_kms_helper,ttm,nouveau wmi 32768 4 asus_wmi,wmi_bmof,mxm_wmi,nouveau video 57344 2 asus_wmi,nouveau # 则无效 # 6. 卸载掉原有驱动(若安装过其他版本或其他方式安装过驱动执行此项) sudo apt-get remove nvidia-* # 7. 给驱动run文件赋予执行权限 sudo chmod a+x NVIDIA-Linux-x86_64-435.21.run # 8.安装 sudo ./NVIDIA-Linux-x86_64-435.21.run -no-opengl-files -no-x-check -no-nouveau-check # -no-x-check:安装驱动时关闭X服务 # -no-nouveau-check:安装驱动时禁用nouveau # -no-opengl-files:只安装驱动文件,不安装OpenGL文件 1.The distribution-provided pre-install script failed! Are you sure you want to continue? “Yes” 2.Would you like to register the kernel module souces with DKMS? This will allow DKMS to automatically build a new module, if you install a different kernel later? “No” 3.Nvidia’s 32-bit compatibility libraries? “No” 4.Would you like to run the nvidia-xconfigutility to automatically update your x configuration so that the NVIDIA x driver will be used when you restart x? Any pre-existing x confile will be backed up. “Yes” ``` #### Reference: - https://blog.csdn.net/way7486chundan/article/details/120711834 - https://blog.csdn.net/wjinjie/article/details/108512153 - https://blog.51cto.com/u_7072753/3826928 - https://blog.csdn.net/xunan003/article/details/81665835 ### 安装CUDA ```bash nvidia-smi +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 530.41.03 Driver Version: 530.41.03 CUDA Version: 12.1 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 3060 Off| 00000000:01:00.0 Off | N/A | | 30% 32C P0 37W / 170W| 0MiB / 12288MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | No running processes found | +---------------------------------------------------------------------------------------+ ``` 获取支持最高CUDA版本是12.1 到 [cuda-toolkit-archive](https://developer.nvidia.com/cuda-toolkit-archive) 页面下载。选择runfile(local) ```bash wget https://developer.download.nvidia.com/compute/cuda/12.0.1/local_installers/cuda_12.0.1_525.85.12_linux.run sudo sh cuda_12.0.1_525.85.12_linux.run ``` 输入accept接受协议,如果已安装驱动,在下一步中按空格去除driver项,之后选择install。 ```bash root@server:~/nvidia# sh cuda_12.0.1_525.85.12_linux.run =========== = Summary = =========== Driver: Not Selected Toolkit: Installed in /usr/local/cuda-12.0/ Please make sure that - PATH includes /usr/local/cuda-12.0/bin - LD_LIBRARY_PATH includes /usr/local/cuda-12.0/lib64, or, add /usr/local/cuda-12.0/lib64 to /etc/ld.so.conf and run ldconfig as root To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-12.0/bin ***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 525.00 is required for CUDA 12.0 functionality to work. To install the driver using this installer, run the following command, replacing with the name of this run file: sudo .run --silent --driver Logfile is /var/log/cuda-installer.log ``` 配置PATH ```bash export PATH=/usr/local/cuda-12.0/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda-12.0/lib64:$LD_LIBRARY_PATH ``` **Reference**: - https://blog.csdn.net/CC977/article/details/122789394 ### 安装cuDNN 到[cudnn-archive](https://developer.nvidia.com/rdp/cudnn-archive)下载Local Installer for Linux x86_64 (Tar) ```bash curl -L -o cudnn-linux-x86_64-8.8.0.121_cuda12-archive.tar.xz # 将下载的 cuDNN Library for Linux (x86_64) 解压,复制解压出来的文件复制到安装好的CUDA环境中 tar xvJf cudnn-linux-x86_64-8.8.0.121_cuda12-archive.tar.xz # 将cuda/include/cudnn.h文件复制到usr/local/cuda/include文件夹 cp -r cudnn-linux-x86_64-8.8.0.121_cuda12-archive/include/. /usr/local/cuda-12.0/include # 将cuda/lib64/下所有文件复制到/usr/local/cuda/lib64文件夹中 cp -r cudnn-linux-x86_64-8.8.0.121_cuda12-archive/lib/. /usr/local/cuda-12.0/lib64 # 添加读取权限 chmod a+r /usr/local/cuda-12.0/include/cudnn*.h /usr/local/cuda-12.0/lib64/libcudnn* # cudnn 版本查看 # 新版 cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2 # 旧版 cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A2 export LD_LIBRARY_PATH=/usr/local/cuda-12.0/targets/x86_64-linux/lib:$LD_LIBRARY_PATH ``` ## Problem ### ubuntu 内核自动更新导致nvidia不可用 https://blog.csdn.net/nizhenshishuai/article/details/123873453 https://blog.csdn.net/weixin_48319333/article/details/127904278 ```bash dpkg --get-selections | grep linux-image # 禁用更新 sudo apt-mark hold linux-image-generic linux-headers-generic sudo apt-mark unhold linux-image-generic linux-headers-generic ```