To get started on the supercomputer you need to follow the instructions and get an account from https://marylou.byu.edu/. Once you have this set up you can SSH in with
ssh <username>@ssh.fsl.byu.edu
Welcome to our new home directory. Using the supercomputer means we have to remember elementary school and be nice and share. This means we can only used software that is approved and stored in modules. For this class we want to use python, tensorflow, and cuda for the GPU. We setup our environment by creating a file called .modules and telling it what we want to use.
#%Module module load defaultenv module add cuda module add cudnn/4.0_gcc-4.4.7 module add tensorflow/0.9.0_python-2.7.11+cuda
UPDATE: apparently, the following module file works better:
#%Module module load defaultenv module load cuda/8.0 module load cudnn/5.1_cuda-8.0 module load python/2/7
The computer lab grants most memory to the compute directory, so from now on we will make sure to put all data and code in there.
Now that our environment is set up we can run a python script. This is done by submitting a job with whatever deep learning magic you want to run. The method for submitting this is called a SLURM script. This is just a bash script that tells the recourse manager how much memory you need, CPU's, time, and so on.
So time for a baby example with using the tensorflow program hello.py:
import tensorflow as tf hello = tf.constant('Hello') sess = tf.Session() print sess.run(hello)
Now to create our slurm script we can use the GUI at https://marylou.byu.edu/documentation/slurm/script-generator. This will give us a file that I will name slurm.sh that looks like this
#!/bin/bash #SBATCH --time=01:00:00 # walltime - this is one hour #SBATCH --ntasks=1 # number of processor cores (i.e. tasks) #SBATCH --nodes=1 # number of nodes #SBATCH --gres=gpu:1 #SBATCH --mem-per-cpu=4096M # memory per CPU core # LOAD MODULES, INSERT CODE, AND RUN YOUR PROGRAMS HERE python hello.py
Simple enough, right? Also it is important to make sure we tell it how much memory and time we expect. If we give it a lot we will have less priority and have to weight longer for the job to start.
To submit your job, use the sbatch
command, as in sbatch ./slurm.sh
.