NVIDIA CUDA

安装 NVIDIA driver CUDA Driver cuDNN

Nvidia 配置

安装驱动

https://www.nvidia.cn/Download/index.aspx?lang=cn

wget https://cn.download.nvidia.cn/XFree86/Linux-x86_64/530.41.03/NVIDIA-Linux-x86_64-530.41.03.run

# 0. 依赖
apt install gcc g++ make cmake -y
# 1. 打开文件blacklist.conf
sudo gedit /etc/modprobe.d/blacklist-nouveau.conf
# 1. 向文件中写入以下两行内容
  blacklist nouveau
  options nouveau modeset=0
# 3. 更新系统(能更新就更新) 重新生成 kernel initramfs

sudo update-initramfs -u
# 4. 重启电脑
reboot

5. 验证是否禁用成功。若无任何输出,则禁用成功,如果有行输出则未禁用。

lsmod | grep nouveau

# 若输出
nouveau              1949696  0
mxm_wmi                16384  1 nouveau
i2c_algo_bit           16384  1 nouveau
ttm                   106496  1 nouveau
drm_kms_helper        184320  1 nouveau
drm                   495616  3 drm_kms_helper,ttm,nouveau
wmi                    32768  4 asus_wmi,wmi_bmof,mxm_wmi,nouveau
video                  57344  2 asus_wmi,nouveau

# 则无效

# 6. 卸载掉原有驱动(若安装过其他版本或其他方式安装过驱动执行此项)

sudo apt-get remove nvidia-*  

# 7. 给驱动run文件赋予执行权限
sudo chmod a+x NVIDIA-Linux-x86_64-435.21.run

# 8.安装
sudo ./NVIDIA-Linux-x86_64-435.21.run -no-opengl-files -no-x-check -no-nouveau-check


# -no-x-check:安装驱动时关闭X服务
# -no-nouveau-check:安装驱动时禁用nouveau
# -no-opengl-files:只安装驱动文件,不安装OpenGL文件

1.The distribution-provided pre-install script failed! Are you sure you want to continue? 
“Yes”
2.Would you like to register the kernel module souces with DKMS? This will allow DKMS to automatically build a new module, if you install a different kernel later?
“No”
3.Nvidia’s 32-bit compatibility libraries?
“No”
4.Would you like to run the nvidia-xconfigutility to automatically update your x configuration so that the NVIDIA x driver will be used when you restart x? Any pre-existing x confile will be backed up. 
“Yes”

Reference:

  • https://blog.csdn.net/way7486chundan/article/details/120711834

  • https://blog.csdn.net/wjinjie/article/details/108512153

  • https://blog.51cto.com/u_7072753/3826928

  • https://blog.csdn.net/xunan003/article/details/81665835

安装CUDA

nvidia-smi

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.41.03              Driver Version: 530.41.03    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3060         Off| 00000000:01:00.0 Off |                  N/A |
| 30%   32C    P0               37W / 170W|      0MiB / 12288MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

获取支持最高CUDA版本是12.1

cuda-toolkit-archive 页面下载。选择runfile(local)

wget https://developer.download.nvidia.com/compute/cuda/12.0.1/local_installers/cuda_12.0.1_525.85.12_linux.run
sudo sh cuda_12.0.1_525.85.12_linux.run

输入accept接受协议,如果已安装驱动,在下一步中按空格去除driver项,之后选择install。

root@server:~/nvidia# sh cuda_12.0.1_525.85.12_linux.run
===========
= Summary =
===========

Driver:   Not Selected
Toolkit:  Installed in /usr/local/cuda-12.0/

Please make sure that
 -   PATH includes /usr/local/cuda-12.0/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-12.0/lib64, or, add /usr/local/cuda-12.0/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-12.0/bin
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 525.00 is required for CUDA 12.0 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
    sudo <CudaInstaller>.run --silent --driver

Logfile is /var/log/cuda-installer.log

配置PATH

export PATH=/usr/local/cuda-12.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-12.0/lib64:$LD_LIBRARY_PATH

Reference:

  • https://blog.csdn.net/CC977/article/details/122789394

安装cuDNN

cudnn-archive下载Local Installer for Linux x86_64 (Tar)

curl -L <URL> -o cudnn-linux-x86_64-8.8.0.121_cuda12-archive.tar.xz

# 将下载的 cuDNN Library for Linux (x86_64) 解压,复制解压出来的文件复制到安装好的CUDA环境中
tar xvJf cudnn-linux-x86_64-8.8.0.121_cuda12-archive.tar.xz
# 将cuda/include/cudnn.h文件复制到usr/local/cuda/include文件夹
cp -r cudnn-linux-x86_64-8.8.0.121_cuda12-archive/include/. /usr/local/cuda-12.0/include
# 将cuda/lib64/下所有文件复制到/usr/local/cuda/lib64文件夹中
cp -r cudnn-linux-x86_64-8.8.0.121_cuda12-archive/lib/. /usr/local/cuda-12.0/lib64
# 添加读取权限
chmod a+r /usr/local/cuda-12.0/include/cudnn*.h /usr/local/cuda-12.0/lib64/libcudnn*

# cudnn 版本查看
# 新版
cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2
# 旧版
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A2

export LD_LIBRARY_PATH=/usr/local/cuda-12.0/targets/x86_64-linux/lib:$LD_LIBRARY_PATH

Problem

ubuntu 内核自动更新导致nvidia不可用

https://blog.csdn.net/nizhenshishuai/article/details/123873453 https://blog.csdn.net/weixin_48319333/article/details/127904278

dpkg --get-selections | grep linux-image
# 禁用更新
sudo apt-mark hold linux-image-generic linux-headers-generic 
sudo apt-mark unhold linux-image-generic linux-headers-generic