The most important reason was saving time while prototyping models — if they trained faster, the feedback time would be shorter. Thus it would be easier on my brain to connect the dots between the assumptions I had for the model and its results.
Then I wanted to save money — I was using Amazon Web Services (AWS), which offered P2 instances with Nvidia K80 GPUs. Lately the AWS bills were around $60–70/month with a tendency to get larger. Also it is expensive to store large datasets, like ImageNet.
And lastly I haven’t had a desktop for over 10 years, and wanted to see what has changed in the meantime (spoiler alert: mostly nothing).
What follows are my choices, inner monologue and gotchas: from choosing the components to benchmarking.
1. Choosing components
2. Putting it together
3. Software Setup
A sensible budget for me would be about 2 years worth of my current compute spending. At $70/month for AWS, this put it at around $1700 for the whole thing.
You can checkout all the components used. The PC Part Picker site is also really helpful in detecting if some of the components don’t play well together.
The GPU is the most crucial component in the box. It will train these deep networks fast, shortening the feedback cycle.
The GPU is important is because: a) most calculations in DL are matrix operations, like matrix multiplication. They can be slow if done on the CPU. b) As we are doing thousands of these operations in a typical neural network, the slowness really adds up (as we will see on the benchmarks later). On the other hand, GPUs, rather conveniently, are able to run all these operations in parallel. They have a large number of cores, which can run even
larger number of threads. GPUs also have much higher memory bandwidth which enables them to run these parallel operations on a bunch of data at once.
My choice was between a few of Nvidia’s cards: GTX 1070 ($360), GTX 1080 ($500), GTX 1080 Ti ($700) and finally the Titan X ($1320).
On performance side: GTX 1080 Ti and Titan X are similar, roughly speaking the GTX 1080 is about 25% faster than GTX 1070, and GTX 1080 Ti is about 30% faster than GTX 1080.
Tim Dettmers has a great article on picking a GPU for Deep Learning, which he regularly updates as new cards come on the market.
Here are the things to consider when picking a GPU:Maker: No contest on this one — get Nvidia. They has been focusing on Machine Learning for a number of years now, and it’s paying off. Their CUDA toolkit is entrenched so deeply that it is literally the only choice for the DL practitioner.
Considering all of this, I picked the GTX 1080 Ti, mainly for the training speed boost. I plan to add a second 1080 Ti soonish.
Even thought the GPU is the MVP in deep learning, the CPU still matters. For example, data preparation is usually done on the CPU. The number of cores and threads per core is important if we want to parallelize all that data prep.
To stay on budget, I picked a mid-range CPU, the Intel i5 7500 for about $190. It’s relatively cheap but good enough to not slow things down.
It’s nice to have a lot of memory if we are to be working with rather big datasets. I got 2 sticks of 16 GB, for a total of 32 GB of RAM for $230, and plan to buy another 32 GB later.
Following Jeremy Howard’s advice, I got a fast SSD disk to keep my OS and current data on, and then a slow spinning HDD for those huge datasets (like ImageNet). SSD: I remember when I got my first Macbook Air years ago, how blown away was I by the SSD speed. To my delight, a new generation of SSD called NVMe has made its way to market in the meantime. A 480 GB MyDigitalSSDNVMe drive for $230 was a great deal. This baby copies files at gigabytes per second. HDD: 2 TB for $66. While SSDs have been getting fast, HDD have been getting cheap. To somebody who has used Macbooks with 128 GB disk for the last 7 years, having this much space feels almost obscene.
The one thing that I kept in mind when picking a motherboard was the ability to support two GTX 1080 Ti, both in number of PCI Express Lanes (the minimum is 2x8) and the physical size of 2 cards. Also make sure it’s compatible with the chosen CPU. An Asus TUF Z270 for $130 did it for me.
Rule of thumb: it should provide enough power for the CPU and the GPUs, plus 100 watts extra. The Intel i5 7500 processor uses 65W, and the GPUs (1080 Ti) need 250W each, so I got a Deepcool 750W Gold PSU for $75. The “Gold” here refers to the power efficiency, i.e how much of the power consumed is wasted as heat.
The case should be the same form factor as the motherboard. Also having enough LEDs to embarrass a Burner is a bonus.
A friend recommended the Thermaltake N23 case for $50, which I promptly got. No LEDs sadly.
If you don’t have much experience with hardware and fear you might break something, a professional assembly might be the best option. However, this was a great learning opportunity that I couldn’t pass (even though I’ve had my share of hardware-related horror stories).
The first and important step is to read the installation manuals that came with each component. Especially important for me, as I’ve done this before once or twice, and I have just the right amount of inexperience to mess things up.
The CPU in its slot, the lever refusing to godown.
This is done before installing the motherboard in the case. Next to the processor there is a lever that needs to be pulled up. The processor is then placed on the base (double-check the orientation). Finally the lever comes down to fix the CPU in place.
Me being assisted in installing theCPU
But I had a quite the difficulty doing this: once the CPU was in position the lever wouldn’t go down. I actually had a more hardware-capable friend of mine video walk me through the process. Turns out the amount of force required to get the lever locked down was more than what I was comfortable with.
The installed fan
Next is fixing the fan on top of the CPU: the fan legs must be fully secured to the motherboard. Also consider where the fan cable will go before installing. The processor I had came with thermal paste. If yours doesn’t, make sure to put some paste between the CPU and the cooling unit. Also replace the paste if you take off the fan.
Fitting the power cables through the backside.
I put the Power Supply Unit (PSU) in before the motherboard to get the power cables snugly placed in case back side.
Having fun withmagnets
Pretty straight forward — carefully place it and screw it in. A magnetic screwdriver was really helpful.
Then connect the power cables and the case buttons and LEDs.
Just slide it in the M2 slot and screw it in. Piece of cake.
The GTX 1080 Ti calmly waiting its turn as I struggle with the RAM in the background.
The memory proved quite hard to install, requiring too much effort to properly lock in. A few times I almost gave up, thinking I must be doing it wrong. Eventually one of the sticks clicked in and the other one promptly followed.
At this point I turned the computer on to make sure it works. To my relief, it started right away!
The GTX 1080 Ti setting into its newhome
Finally, the GPU slid in effortlessly. 14 pins of power later and it was running.
NB: Do not plug your monitor in the external card right away. Most probably it needs drivers to function (see below).
Finally, it’s complete!Software Setup
Now that we have the hardware in place, only the soft part remains. Out with the screwdriver, in with the keyboard.
Note on dual booting: If you plan to install Windows (because, you know, for benchmarks, totally not for gaming), it would be wise to do Windows first and Linux second. I didn’t and had to reinstall Ubuntu because Windows messed up the boot partition. Livewire has a detailed article on dual boot.
Most DL frameworks are designed to work on Linux first, and eventually support other operating systems. So I went for Ubuntu, my default Linux distribution. An old 2GB USB drive was laying around and worked great for the installation. UNetbootin (OSX) 0r Rufus (Windows) can prepare the Linux thumb drive. The default options worked fine during the Ubuntu install.
At the time of writing, Ubuntu 17.04 was just released, so I opted for previous version (16.04), whose quirks are much better documented online.
Ubuntu Server or Desktop: The Server and Desktop editions of Ubuntu are almost identical, with the notable exception of the visual interface (called X) not being installed with Server. I installed the Desktop and disabled autostarting X, so that the computer would boot it in terminal mode. If needed, one could launch the visual desktop later by typing startx.
Let’s get our install up to date. From Jeremy Howard’s excellent install-gpu script:
To deep learn on our machine, we need a stack of technologies to use our GPU:GPU driver — A way for the operating system to talk to the graphics card.
Download CUDA from Nvidia, or just run the code below:
After CUDA has been installed the following code will add the CUDA installation to the PATH variable:
Now we can verify that CUDA has been installed successfully by running
This should have installed the display driver as well. For me, nvidia-smishowed ERR as the device name, so I installed the latest Nvidia drivers (at time of writing) to fix it:
Removing CUDA/Nvidia drivers
If at any point the drivers or CUDA seem broken (as they did for me — multiple times), it might be better to start over by running:
We install CuDNN 5.1 as currently Tensoflow doesn’t support CuDNN 6. To download CuDNN, one needs to register for a (free) developer account. After downloading, install with the following:
Anaconda is a great package manager for python. I’ve moved to python 3.6, so will be using the Anaconda 3 version:
The popular DL framework by Google. Installation:
Validate Tensorfow install: To make sure we have our stack running smoothly, I like to run the tensorflow MNIST example:
We should see the loss decreasing during training:
Keras is a great high-level neural networks framework, an absolute pleasure to work with. Installation can’t be easier too:
PyTorch is a newcomer in the world of DL frameworks, but its API is modeled on the successful Torch, which was written in Lua. PyTorch feels new and exciting, mostly great, although some things are still to be implemented. We install it by running:
Jupyter is an web-based IDE for Python, which is ideal for data sciency tasks. It’s installed with Anaconda, so we just configure and test it:
Now if we open http://localhost:8888 we should see a Jupyter screen.
Run Jupyter on boot
Rather than running the notebook every time the computer is restarted, we can set it to autostart on boot. We will use crontab to do this, which we can edit by running crontab -e. Then add the following after the last line in the crontab file:
I use my old trusty Macbook Air for development, so I’d like to be able to log into the DL box both from my home network, also when on the run.
SSH Key: It’s way more secure to use a SSH key to login instead of a password. Digital Ocean has a great guide on how to setup this.
SSH tunnel: If you want to access your jupyter notebook from another computer, the recommended way is to use SSH tunneling (instead of opening the notebook to the world and protecting with a password). Let’s see how we can do this:First we need an SSH server. We install it by running the following on the DL box (server): sudo apt-get install openssh-server
2. Then to connect over SSH tunnel, run the following script on the client:
To test this, open a browser and try http://localhost:8888 from the remote machine. Your Jupyter notebook should appear.
Setup out-of-network access: Finally to access the DL box from the outside world, we need 3 things:Static IP for your home network (or a service to emulate that) — so that we know on what address to connect.
Setting up out-of-network access depends on the router/network setup, so I’m not going into details.
Now that we have everything running smoothly, let’s put it to the test. We’ll be comparing the newly built box to an AWS P2.xlarge instance, which is what I’ve used so far for DL. The tests are computer vision related, meaning convolution networks with a fully connected model thrown in. We time training models on: AWS P2 instance GPU (K80), AWS P2 virtual CPU, the GTX 1080 Ti and Intel I5 7500 CPU.
The “Hello World” of computer vision. The MNIST database consists of 70,000 handwritten digits. We run the Keras example on MNIST which uses Multilayer Perceptron (MLP). The MLP means that we are using only fully connected layers, not convolutions. The model is trained for 20 epochs on this dataset, which achieves over 98% accuracy out of the box.
We see that the GTX 1080 Ti is 2.4 times faster than the K80 on AWS P2 in training the model. This is rather surprising as these 2 cards should have about the same performance. I believe this is because of the virtualization or underclocking of the K80 on AWS.
The CPUs perform 9 times slower the GPUs. As we will see later, it’s a really good result for the processors. This is due to the small model which fails to fully utilize the parallel processing power of the GPUs.
Interestingly, the desktop Intel i5–7500 achieves 2.3x speedup over the virtual CPU on Amazon.
A VGG net will be finetuned for the Kaggle Dogs vs Cats competition. In this competition we need to tell apart pictures of dogs and cats. Running the model on CPUs for the same number of batches wasn’t feasible. Therefore we finetune for 390 batches (1 epoch) on the GPUs and 10 batches on the CPUs. The code used is on github.
The 1080 Ti is 5.5 times faster that the AWS GPU (K80). The difference in