NVIDIA CUDA¶
安装 NVIDIA driver CUDA Driver cuDNN
Nvidia 配置¶
安装驱动¶
https://www.nvidia.cn/Download/index.aspx?lang=cn
wget https://cn.download.nvidia.cn/XFree86/Linux-x86_64/530.41.03/NVIDIA-Linux-x86_64-530.41.03.run
# 0. 依赖
apt install gcc g++ make cmake -y
# 1. 打开文件blacklist.conf
sudo gedit /etc/modprobe.d/blacklist-nouveau.conf
# 1. 向文件中写入以下两行内容
blacklist nouveau
options nouveau modeset=0
# 3. 更新系统(能更新就更新) 重新生成 kernel initramfs
sudo update-initramfs -u
# 4. 重启电脑
reboot
5. 验证是否禁用成功。若无任何输出,则禁用成功,如果有行输出则未禁用。
lsmod | grep nouveau
# 若输出
nouveau 1949696 0
mxm_wmi 16384 1 nouveau
i2c_algo_bit 16384 1 nouveau
ttm 106496 1 nouveau
drm_kms_helper 184320 1 nouveau
drm 495616 3 drm_kms_helper,ttm,nouveau
wmi 32768 4 asus_wmi,wmi_bmof,mxm_wmi,nouveau
video 57344 2 asus_wmi,nouveau
# 则无效
# 6. 卸载掉原有驱动(若安装过其他版本或其他方式安装过驱动执行此项)
sudo apt-get remove nvidia-*
# 7. 给驱动run文件赋予执行权限
sudo chmod a+x NVIDIA-Linux-x86_64-435.21.run
# 8.安装
sudo ./NVIDIA-Linux-x86_64-435.21.run -no-opengl-files -no-x-check -no-nouveau-check
# -no-x-check:安装驱动时关闭X服务
# -no-nouveau-check:安装驱动时禁用nouveau
# -no-opengl-files:只安装驱动文件,不安装OpenGL文件
1.The distribution-provided pre-install script failed! Are you sure you want to continue?
“Yes”
2.Would you like to register the kernel module souces with DKMS? This will allow DKMS to automatically build a new module, if you install a different kernel later?
“No”
3.Nvidia’s 32-bit compatibility libraries?
“No”
4.Would you like to run the nvidia-xconfigutility to automatically update your x configuration so that the NVIDIA x driver will be used when you restart x? Any pre-existing x confile will be backed up.
“Yes”
Reference:¶
https://blog.csdn.net/way7486chundan/article/details/120711834
https://blog.csdn.net/wjinjie/article/details/108512153
https://blog.51cto.com/u_7072753/3826928
https://blog.csdn.net/xunan003/article/details/81665835
安装CUDA¶
nvidia-smi
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.41.03 Driver Version: 530.41.03 CUDA Version: 12.1 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3060 Off| 00000000:01:00.0 Off | N/A |
| 30% 32C P0 37W / 170W| 0MiB / 12288MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
获取支持最高CUDA版本是12.1
到 cuda-toolkit-archive 页面下载。选择runfile(local)
wget https://developer.download.nvidia.com/compute/cuda/12.0.1/local_installers/cuda_12.0.1_525.85.12_linux.run
sudo sh cuda_12.0.1_525.85.12_linux.run
输入accept接受协议,如果已安装驱动,在下一步中按空格去除driver项,之后选择install。
root@server:~/nvidia# sh cuda_12.0.1_525.85.12_linux.run
===========
= Summary =
===========
Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-12.0/
Please make sure that
- PATH includes /usr/local/cuda-12.0/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-12.0/lib64, or, add /usr/local/cuda-12.0/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-12.0/bin
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 525.00 is required for CUDA 12.0 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
sudo <CudaInstaller>.run --silent --driver
Logfile is /var/log/cuda-installer.log
配置PATH
export PATH=/usr/local/cuda-12.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-12.0/lib64:$LD_LIBRARY_PATH
Reference:
https://blog.csdn.net/CC977/article/details/122789394
安装cuDNN¶
到cudnn-archive下载Local Installer for Linux x86_64 (Tar)
curl -L <URL> -o cudnn-linux-x86_64-8.8.0.121_cuda12-archive.tar.xz
# 将下载的 cuDNN Library for Linux (x86_64) 解压,复制解压出来的文件复制到安装好的CUDA环境中
tar xvJf cudnn-linux-x86_64-8.8.0.121_cuda12-archive.tar.xz
# 将cuda/include/cudnn.h文件复制到usr/local/cuda/include文件夹
cp -r cudnn-linux-x86_64-8.8.0.121_cuda12-archive/include/. /usr/local/cuda-12.0/include
# 将cuda/lib64/下所有文件复制到/usr/local/cuda/lib64文件夹中
cp -r cudnn-linux-x86_64-8.8.0.121_cuda12-archive/lib/. /usr/local/cuda-12.0/lib64
# 添加读取权限
chmod a+r /usr/local/cuda-12.0/include/cudnn*.h /usr/local/cuda-12.0/lib64/libcudnn*
# cudnn 版本查看
# 新版
cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2
# 旧版
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A2
export LD_LIBRARY_PATH=/usr/local/cuda-12.0/targets/x86_64-linux/lib:$LD_LIBRARY_PATH
Problem¶
ubuntu 内核自动更新导致nvidia不可用¶
https://blog.csdn.net/nizhenshishuai/article/details/123873453 https://blog.csdn.net/weixin_48319333/article/details/127904278
dpkg --get-selections | grep linux-image
# 禁用更新
sudo apt-mark hold linux-image-generic linux-headers-generic
sudo apt-mark unhold linux-image-generic linux-headers-generic