User Tools

Site Tools


Deep Learning on the Supercomputer

Setting Up Supercomputer

To get started on the supercomputer you need to follow the instructions and get an account from Once you have this set up you can SSH in with

 ssh <username> 

Welcome to our new home directory. Using the supercomputer means we have to remember elementary school and be nice and share. This means we can only used software that is approved and stored in modules. For this class we want to use python, tensorflow, and cuda for the GPU. We setup our environment by creating a file called .modules and telling it what we want to use.


module load defaultenv
module add cuda
module add cudnn/4.0_gcc-4.4.7
module add tensorflow/0.9.0_python-2.7.11+cuda

UPDATE: apparently, the following module file works better:


module load defaultenv
module load cuda/8.0
module load cudnn/5.1_cuda-8.0
module load python/2/7

The computer lab grants most memory to the compute directory, so from now on we will make sure to put all data and code in there.

Running Programs

Now that our environment is set up we can run a python script. This is done by submitting a job with whatever deep learning magic you want to run. The method for submitting this is called a SLURM script. This is just a bash script that tells the recourse manager how much memory you need, CPU's, time, and so on.

So time for a baby example with using the tensorflow program

import tensorflow as tf

hello = tf.constant('Hello')

sess = tf.Session()

Now to create our slurm script we can use the GUI at This will give us a file that I will name that looks like this


#SBATCH --time=01:00:00   # walltime - this is one hour
#SBATCH --ntasks=1   # number of processor cores (i.e. tasks)
#SBATCH --nodes=1   # number of nodes
#SBATCH --gres=gpu:1
#SBATCH --mem-per-cpu=4096M   # memory per CPU core


Simple enough, right? Also it is important to make sure we tell it how much memory and time we expect. If we give it a lot we will have less priority and have to weight longer for the job to start.

To submit your job, use the sbatch command, as in sbatch ./

Pro Tips

  • Make sure your tf code uses the GPU
  • to see all your jobs status its helpful to make an alias with the command `watch squeue -u<username> –Format=jobid,numcpus,state,timeused,timeleft'
supercomputer.txt · Last modified: 2017/10/10 10:09 by wingated