The following are a set of reference instructions (no warranties) to install a machine learning server. Instructions have been collected from many sources plus additional debugging required when updating the software of one of the machines used for deep learning at the lab.

It uses Ubuntu 16.04 as there are still some incompatibilities with 18.04, as well as CUDA 9.0 and cuDNN 7.3

Instructions for several deep learning frameworks are also given (TensorFlow, Theano, Chainer) as well as OpenCV 3.4

Installation

Install 16.04

Create bootable disk

https://tutorials.ubuntu.com/tutorial/tutorial-create-a-usb-stick-on-ubuntu#0

Change boot order

Restart pc F2 to enter bios Boot / Fast Boot -> Disabled Save Changes and reset / OK PC restarts, F2 again Change USB priority in secure boot Save Changes and reset / OK

Install Ubuntu

Choose Install Ubuntu and follow instructions When restarting pc change bios Fast Boot -> Enabled

Basics (https://towardsdatascience.com/ubuntu-deep-learning-software-installation-guide-fdae09f79903)

sudo apt-add-repository multiverse
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install build-essential cmake g++ gfortran git vim pkg-config python-dev software-properties-common wget
sudo apt-get autoremove
sudo rm -rf /var/lib/apt/lists/*

Install Nvidia drivers

Add nvidia drivers ppa and update repos

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update

re-run apt-get update as root if issues arise (sudo -i)

Install from apt-get. Minimum driver version for CUDA 9.0 is 384

sudo apt-get install nvidia-3## nvidia-modprobe

for current install latest driver was 396 which was used

Restart Verify everything is working by running

nvidia-smi

All graphics cards should be detected and using the nvidia driver installed

Instal CUDA 9.0 (run file)

Download 9.0 latest stable subversion (384.81) and extract

cd
wget https://developer.nvidia.com/compute/cuda/9.0/Prod/local_installers/cuda_9.0.176_384.81_linux-run
chmod +x cuda_9.0.176_384.81_linux-run
./cuda_9.0.176_384.81_linux-run --extract=$HOME

There should be 3 extracted files

  1. NVIDIA-Linux-x86_64-384.81.run -> NVIDIA driver, which we do NOT need, we already installed the latest one in the prev step. just remove it to be safe
    rm NVIDIA-Linux-x86_64-384.81.run
    
  2. cuda-linux.9.0.176-########.run -> CUDA 9.0 driver, which we will install
  3. cuda-samples.9.0.176-#######.run CUDA 9.0 Samples, which we also want

Install CUDA 9.0

sudo ./cuda-linux.9.0.176-22781540.run

Accept the license by scrolling down (press d) and enter ‘accept’ Accept all the defaults (press enter)

Install CUDA samples to verify the install

sudo ./cuda-samples.9.0.176-22781540-linux.run

Same as with CUDA Accept the license by scrolling down (press d) and enter ‘accept’ Accept all the defaults (press enter)

Configure the runtime library

sudo bash -c "echo /usr/local/cuda/lib64/ > /etc/ld.so.conf.d/cuda.conf"
sudo ldconfig

Add cuda path to /etc/environment

sudo nano /etc/environment

Append ‘:/usr/local/cuda/bin’ at the end of the PATH

reboot and test CUDA using CUDA samples - takes long time and lots of WARN do not worry

cd /usr/local/cuda-9.0/samples
sudo make
cd /usr/local/cuda-9.0/samples/bin/x86_64/linux/release
./deviceQuery

It should output something like this

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.2, CUDA Runtime Version = 9.0, NumDevs = 3 Result = PASS

Install cuDNN 7.3 for CUDA 9.0

Go to the cuDNN download page (need registration) and select the latest cuDNN 7.0.* version made for CUDA 9.0. Download all 3 .deb files: the runtime library, the developer library, and the code samples library for Ubuntu 16.04. In your download folder, install them in the same order: Go to the cuDNN download page (need registration) and select the latest cuDNN 7.3.* version made for CUDA 9.0. In the current install we are using cuDNN 7.3.0.29

Download all 3 .deb files: the runtime library, the developer library, and the code samples library for Ubuntu 16.04. Install them in the following order runtime, developer and code samples

sudo dpkg -i libcudnn7_7.3.0.29–1+cuda9.0_amd64.deb
sudo dpkg -i libcudnn7-dev_7.3.0.29–1+cuda9.0_amd64.deb
sudo dpkg -i libcudnn7-doc_7.3.0.29–1+cuda9.0_amd64.deb

Verify install by coping samples to home and compiling MNIST

cp -r /usr/src/cudnn_samples_v7/ ~.
cd ~/cudnn_samples_v7/mnistCUDNN.
make clean && make.
./mnistCUDNN.

If installed correctly you should see Test passed! at the end of the output

Configure CUDA and cuDNN Paths

Put the following line in the end or your .bashrc file

export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/extras/CUPTI/lib64"

Install pip and pip3

sudo apt install python-pip
sudo -H pip install --upgrade pip
sudo apt install python3-pip
sudo -H pip3 install --upgrade pip3

Install python libs

sudo -H pip2 install -U numpy scipy matplotlib ipython jupyter pandas sympy nose scikit-learn
sudo -H pip3 install -U numpy scipy matplotlib ipython jupyter pandas sympy nose scikit-learn

Install NCCL 2.2 for TensorFlow multi-GPU

Download NCCL2.2 (legacy) from https://developer.nvidia.com/nccl NVIDIA developer account needed Download both nccl-repo as well as the network installer

Check glibc >= 2.19

ldd --version

Install local nccl repo

sudo dpkg -i nccl-repo-#version#.deb

It may ask you to install CUDA GPC key, do NOT install

Install network repo

sudo dpkg -i nvidia-machine-learning-repo-#version#.deb

Finish installation

sudo apt update
sudo apt install libnccl2=2.2.13-1+cuda9.0 libnccl-dev=2.2.13-1+cuda9.0

Tests clone nvidia/nccl-tests

git clone https://github.com/NVIDIA/nccl-tests.git

Make and run

cd nccl-tests
make

Example: run on 3 GPUs (-g 3)

./build/all_reduce_perf -b 8 -e 128M -f 2 -g 3

If installed correctly no error nor warning should be output

Go to software sources and check no Nvidia source is left, if not apt update may try to update libnccl2 libnccl-dev libcudnn7 libcudnn7-dev

Install Openmpi

sudo apt install openmpi-bin libopenmpi-dev

Install TensorFlow

sudo -H pip2 install tensorflow-gpu
sudo -H pip3 install tensorflow-gpu

Install Jupyter Hub

Install Jupyter notebook (compatibility issues between Jupiter notebook and ipython with prompt-toolkit require the downgraded versions 6.5.0 and 1.0.15 for the later 2)

sudo -H pip3 install jupyter
sudo -H pip3 install ipython==6.5.0 prompt-toolkit==1.0.15
sudo -H pip2 install ipykernel
sudo -H python2 -m ipykernel install 

For Jupyter Hub https://github.com/jupyterhub/jupyterhub

sudo -H  apt install npm nodejs-legacy
sudo -H  npm install -g configurable-http-proxy
sudo -H python3 -m pip install jupyterhub    

Install Chainer

Install cupy (https://docs-cupy.chainer.org/en/latest/install.html) *for cuda 9.0

sudo -H pip2 install cupy-cuda90
sudo -H pip3 install cupy-cuda90

sudo -H pip2 install chainer
sudo -H pip3 install chainer

Verify install with cuda and cudnn

python2 -c 'import chainer; print(chainer.backends.cuda.available,print(chainer.backends.cuda.cudnn_enabled)'
python3 -c 'import chainer; print(chainer.backends.cuda.available,print(chainer.backends.cuda.cudnn_enabled)'

If successful, you should get (True,True) Error messages mean chainer was not installed correctly If the first output is not true, chainer is not able to use cuda If the second output is not true, chainer cannot use cudnn

Install OpenCV 3.4

Install opengl

sudo apt install mesa-utils

From source (similar to https://github.com/BVLC/caffe/wiki/OpenCV-3.3-Installation-Guide-on-Ubuntu-16.04)

Dependencies

sudo apt-get install --assume-yes build-essential cmake git
sudo apt-get install --assume-yes pkg-config unzip ffmpeg qtbase5-dev python-dev python3-dev python-numpy python3-numpy
sudo apt-get install --assume-yes libopencv-dev libgtk-3-dev libdc1394-22 libdc1394-22-dev libjpeg-dev libpng12-dev libtiff5-dev libjasper-dev
sudo apt-get install --assume-yes libavcodec-dev libavformat-dev libswscale-dev libxine2-dev libgstreamer0.10-dev libgstreamer-plugins-base0.10-dev
sudo apt-get install --assume-yes libv4l-dev libtbb-dev libfaac-dev libmp3lame-dev libopencore-amrnb-dev libopencore-amrwb-dev libtheora-dev
sudo apt-get install --assume-yes libvorbis-dev libxvidcore-dev v4l-utils vtk6
sudo apt-get install --assume-yes liblapacke-dev libopenblas-dev libgdal-dev checkinstall

Download opencv3.4 source. Go to source folder

sudo mkdir buid
cd build
cmake -DENABLE_CX11=ON -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local -D FORCE_VTK=ON -D WITH_TBB=ON -D WITH_V4L=ON -D WITH_QT=ON -D WITH_OPENGL=ON -D WITH_CUBLAS=ON -D CUDA_NVCC_FLAGS="-D_FORCE_INLINES" -D WITH_GDAL=ON -D WITH_XINE=ON -D BUILD_EXAMPLES=ON ..

Find out the number of CPU cores in your machine

nproc

Substitute 12 by output of nproc and make install

make -j12
sudo make install
sudo sh -c 'echo "/usr/local/lib" >> /etc/ld.so.conf.d/opencv.conf'
sudo ldconfig

Install pythorch

Follow directions from pytorch.org For python2.7 and 3.5 with Cuda9.0

sudo -H pip install torch torchvision
sudo -H pip3 install torch torchvision

Verify install with cuda and cudnn

python2 -c 'import torch; print(torch.cuda.is_available())'
python3 -c 'import torch; print(torch.cuda.is_available())'

If successful, you should get (True) Error messages mean chainer was not installed correctly If the is not true, chainer is not able to use cuda

Install Caffe2

For 16.04 (http://caffe.berkeleyvision.org/install_apt.html)

sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev protobuf-compiler
sudo apt-get install --no-install-recommends libboost-all-dev
sudo -H pip2 install future
git clone https://github.com/pytorch/pytorch.git && cd pytorch
git submodule update --init --recursive
sudo -H python setup.py install

Install dockers

sudo apt-get install \
    apt-transport-https \
    ca-certificates \
    curl \
    software-properties-common

Install keras

Install optional requirements

sudo apt-get install graphviz
sudo -H pip2 install pydot
sudo -H pip3 install pydot

Install keras

sudo -H pip2 install keras
sudo -H pip3 install keras

Install blender

Add ppa and apt-get install

sudo add-apt-repository ppa:thomas-schiex/blender
sudo apt-get install blender

To uninstall

sudo apt-get remove --autoremove blender

Remote access

Install mate for better compatibility and lower latency (from http://c-nergy.be/blog/?p=8952)

sudo apt-get update
sudo apt-get install mate-core mate-desktop-environment mate-notification-daemon

Set mate as the default for xrdp

sudo sed -i.bak '/fi/a #xrdp multiple users configuration \n mate-session \n' /etc/xrdp/startwm.sh

Allow rdp connections

sudo apt install xrdp
sudo systemctl enable xrdp

Connect from Ubuntu

To connect use remmina Protocol: RDP-Remote Desktop Protocol

Connect from Windows

Use Remote Desktop Connection *from the command line run mstsc