Installing Ubuntu 16.04 with CUDA 9.0 and cuDNN 7.3 for deep learning
The following are a set of reference instructions (no warranties) to install a machine learning server. Instructions have been collected from many sources plus additional debugging required when updating the software of one of the machines used for deep learning at the lab.
It uses Ubuntu 16.04 as there are still some incompatibilities with 18.04, as well as CUDA 9.0 and cuDNN 7.3
Instructions for several deep learning frameworks are also given (TensorFlow, Theano, Chainer) as well as OpenCV 3.4
Installation
Install 16.04
Create bootable disk
https://tutorials.ubuntu.com/tutorial/tutorial-create-a-usb-stick-on-ubuntu#0
Change boot order
Restart pc F2 to enter bios Boot / Fast Boot -> Disabled Save Changes and reset / OK PC restarts, F2 again Change USB priority in secure boot Save Changes and reset / OK
Install Ubuntu
Choose Install Ubuntu and follow instructions When restarting pc change bios Fast Boot -> Enabled
Basics (https://towardsdatascience.com/ubuntu-deep-learning-software-installation-guide-fdae09f79903)
sudo apt-add-repository multiverse
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install build-essential cmake g++ gfortran git vim pkg-config python-dev software-properties-common wget
sudo apt-get autoremove
sudo rm -rf /var/lib/apt/lists/*
Install Nvidia drivers
Add nvidia drivers ppa and update repos
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
re-run apt-get update as root if issues arise (sudo -i)
Install from apt-get. Minimum driver version for CUDA 9.0 is 384
sudo apt-get install nvidia-3## nvidia-modprobe
for current install latest driver was 396 which was used
Restart Verify everything is working by running
nvidia-smi
All graphics cards should be detected and using the nvidia driver installed
Instal CUDA 9.0 (run file)
Download 9.0 latest stable subversion (384.81) and extract
cd
wget https://developer.nvidia.com/compute/cuda/9.0/Prod/local_installers/cuda_9.0.176_384.81_linux-run
chmod +x cuda_9.0.176_384.81_linux-run
./cuda_9.0.176_384.81_linux-run --extract=$HOME
There should be 3 extracted files
- NVIDIA-Linux-x86_64-384.81.run -> NVIDIA driver, which we do NOT need, we already installed the latest one in the prev step. just remove it to be safe
rm NVIDIA-Linux-x86_64-384.81.run
- cuda-linux.9.0.176-########.run -> CUDA 9.0 driver, which we will install
- cuda-samples.9.0.176-#######.run CUDA 9.0 Samples, which we also want
Install CUDA 9.0
sudo ./cuda-linux.9.0.176-22781540.run
Accept the license by scrolling down (press d) and enter ‘accept’ Accept all the defaults (press enter)
Install CUDA samples to verify the install
sudo ./cuda-samples.9.0.176-22781540-linux.run
Same as with CUDA Accept the license by scrolling down (press d) and enter ‘accept’ Accept all the defaults (press enter)
Configure the runtime library
sudo bash -c "echo /usr/local/cuda/lib64/ > /etc/ld.so.conf.d/cuda.conf"
sudo ldconfig
Add cuda path to /etc/environment
sudo nano /etc/environment
Append ‘:/usr/local/cuda/bin’ at the end of the PATH
reboot and test CUDA using CUDA samples - takes long time and lots of WARN do not worry
cd /usr/local/cuda-9.0/samples
sudo make
cd /usr/local/cuda-9.0/samples/bin/x86_64/linux/release
./deviceQuery
It should output something like this
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.2, CUDA Runtime Version = 9.0, NumDevs = 3 Result = PASS
Install cuDNN 7.3 for CUDA 9.0
Go to the cuDNN download page (need registration) and select the latest cuDNN 7.0.* version made for CUDA 9.0. Download all 3 .deb files: the runtime library, the developer library, and the code samples library for Ubuntu 16.04. In your download folder, install them in the same order: Go to the cuDNN download page (need registration) and select the latest cuDNN 7.3.* version made for CUDA 9.0. In the current install we are using cuDNN 7.3.0.29
Download all 3 .deb files: the runtime library, the developer library, and the code samples library for Ubuntu 16.04. Install them in the following order runtime, developer and code samples
sudo dpkg -i libcudnn7_7.3.0.29–1+cuda9.0_amd64.deb
sudo dpkg -i libcudnn7-dev_7.3.0.29–1+cuda9.0_amd64.deb
sudo dpkg -i libcudnn7-doc_7.3.0.29–1+cuda9.0_amd64.deb
Verify install by coping samples to home and compiling MNIST
cp -r /usr/src/cudnn_samples_v7/ ~.
cd ~/cudnn_samples_v7/mnistCUDNN.
make clean && make.
./mnistCUDNN.
If installed correctly you should see Test passed! at the end of the output
Configure CUDA and cuDNN Paths
Put the following line in the end or your .bashrc file
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/extras/CUPTI/lib64"
Install pip and pip3
sudo apt install python-pip
sudo -H pip install --upgrade pip
sudo apt install python3-pip
sudo -H pip3 install --upgrade pip3
Install python libs
sudo -H pip2 install -U numpy scipy matplotlib ipython jupyter pandas sympy nose scikit-learn
sudo -H pip3 install -U numpy scipy matplotlib ipython jupyter pandas sympy nose scikit-learn
Install NCCL 2.2 for TensorFlow multi-GPU
Download NCCL2.2 (legacy) from https://developer.nvidia.com/nccl NVIDIA developer account needed Download both nccl-repo as well as the network installer
Check glibc >= 2.19
ldd --version
Install local nccl repo
sudo dpkg -i nccl-repo-#version#.deb
It may ask you to install CUDA GPC key, do NOT install
Install network repo
sudo dpkg -i nvidia-machine-learning-repo-#version#.deb
Finish installation
sudo apt update
sudo apt install libnccl2=2.2.13-1+cuda9.0 libnccl-dev=2.2.13-1+cuda9.0
Tests clone nvidia/nccl-tests
git clone https://github.com/NVIDIA/nccl-tests.git
Make and run
cd nccl-tests
make
Example: run on 3 GPUs (-g 3)
./build/all_reduce_perf -b 8 -e 128M -f 2 -g 3
If installed correctly no error nor warning should be output
Go to software sources and check no Nvidia source is left, if not apt update may try to update libnccl2 libnccl-dev libcudnn7 libcudnn7-dev
Install Openmpi
sudo apt install openmpi-bin libopenmpi-dev
Install TensorFlow
sudo -H pip2 install tensorflow-gpu
sudo -H pip3 install tensorflow-gpu
Install Jupyter Hub
Install Jupyter notebook (compatibility issues between Jupiter notebook and ipython with prompt-toolkit require the downgraded versions 6.5.0 and 1.0.15 for the later 2)
sudo -H pip3 install jupyter
sudo -H pip3 install ipython==6.5.0 prompt-toolkit==1.0.15
sudo -H pip2 install ipykernel
sudo -H python2 -m ipykernel install
For Jupyter Hub https://github.com/jupyterhub/jupyterhub
sudo -H apt install npm nodejs-legacy
sudo -H npm install -g configurable-http-proxy
sudo -H python3 -m pip install jupyterhub
Install Chainer
Install cupy (https://docs-cupy.chainer.org/en/latest/install.html) *for cuda 9.0
sudo -H pip2 install cupy-cuda90
sudo -H pip3 install cupy-cuda90
sudo -H pip2 install chainer
sudo -H pip3 install chainer
Verify install with cuda and cudnn
python2 -c 'import chainer; print(chainer.backends.cuda.available,print(chainer.backends.cuda.cudnn_enabled)'
python3 -c 'import chainer; print(chainer.backends.cuda.available,print(chainer.backends.cuda.cudnn_enabled)'
If successful, you should get (True,True) Error messages mean chainer was not installed correctly If the first output is not true, chainer is not able to use cuda If the second output is not true, chainer cannot use cudnn
Install OpenCV 3.4
Install opengl
sudo apt install mesa-utils
From source (similar to https://github.com/BVLC/caffe/wiki/OpenCV-3.3-Installation-Guide-on-Ubuntu-16.04)
Dependencies
sudo apt-get install --assume-yes build-essential cmake git
sudo apt-get install --assume-yes pkg-config unzip ffmpeg qtbase5-dev python-dev python3-dev python-numpy python3-numpy
sudo apt-get install --assume-yes libopencv-dev libgtk-3-dev libdc1394-22 libdc1394-22-dev libjpeg-dev libpng12-dev libtiff5-dev libjasper-dev
sudo apt-get install --assume-yes libavcodec-dev libavformat-dev libswscale-dev libxine2-dev libgstreamer0.10-dev libgstreamer-plugins-base0.10-dev
sudo apt-get install --assume-yes libv4l-dev libtbb-dev libfaac-dev libmp3lame-dev libopencore-amrnb-dev libopencore-amrwb-dev libtheora-dev
sudo apt-get install --assume-yes libvorbis-dev libxvidcore-dev v4l-utils vtk6
sudo apt-get install --assume-yes liblapacke-dev libopenblas-dev libgdal-dev checkinstall
Download opencv3.4 source. Go to source folder
sudo mkdir buid
cd build
cmake -DENABLE_CX11=ON -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local -D FORCE_VTK=ON -D WITH_TBB=ON -D WITH_V4L=ON -D WITH_QT=ON -D WITH_OPENGL=ON -D WITH_CUBLAS=ON -D CUDA_NVCC_FLAGS="-D_FORCE_INLINES" -D WITH_GDAL=ON -D WITH_XINE=ON -D BUILD_EXAMPLES=ON ..
Find out the number of CPU cores in your machine
nproc
Substitute 12 by output of nproc and make install
make -j12
sudo make install
sudo sh -c 'echo "/usr/local/lib" >> /etc/ld.so.conf.d/opencv.conf'
sudo ldconfig
Install pythorch
Follow directions from pytorch.org For python2.7 and 3.5 with Cuda9.0
sudo -H pip install torch torchvision
sudo -H pip3 install torch torchvision
Verify install with cuda and cudnn
python2 -c 'import torch; print(torch.cuda.is_available())'
python3 -c 'import torch; print(torch.cuda.is_available())'
If successful, you should get (True) Error messages mean chainer was not installed correctly If the is not true, chainer is not able to use cuda
Install Caffe2
For 16.04 (http://caffe.berkeleyvision.org/install_apt.html)
sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev protobuf-compiler
sudo apt-get install --no-install-recommends libboost-all-dev
sudo -H pip2 install future
git clone https://github.com/pytorch/pytorch.git && cd pytorch
git submodule update --init --recursive
sudo -H python setup.py install
Install dockers
sudo apt-get install \
apt-transport-https \
ca-certificates \
curl \
software-properties-common
Install keras
Install optional requirements
sudo apt-get install graphviz
sudo -H pip2 install pydot
sudo -H pip3 install pydot
Install keras
sudo -H pip2 install keras
sudo -H pip3 install keras
Install blender
Add ppa and apt-get install
sudo add-apt-repository ppa:thomas-schiex/blender
sudo apt-get install blender
To uninstall
sudo apt-get remove --autoremove blender
Remote access
Install mate for better compatibility and lower latency (from http://c-nergy.be/blog/?p=8952)
sudo apt-get update
sudo apt-get install mate-core mate-desktop-environment mate-notification-daemon
Set mate as the default for xrdp
sudo sed -i.bak '/fi/a #xrdp multiple users configuration \n mate-session \n' /etc/xrdp/startwm.sh
Allow rdp connections
sudo apt install xrdp
sudo systemctl enable xrdp
Connect from Ubuntu
To connect use remmina Protocol: RDP-Remote Desktop Protocol
Connect from Windows
Use Remote Desktop Connection *from the command line run mstsc