User Tools

Site Tools


supercomputer

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Next revision Both sides next revision
supercomputer [2017/08/23 20:00]
sean created
supercomputer [2017/08/23 20:22]
sean [Deep Learning on the Supercomputer]
Line 1: Line 1:
 ====== Deep Learning on the Supercomputer ====== ====== Deep Learning on the Supercomputer ======
  
-==== Intro to the Supercomputer ====+ 
 + 
 +==== Setting Up Supercomputer ====
  
 To get started on the supercomputer you need to follow the instructions and get an account from [[https://​marylou.byu.edu/​]]. Once you have this set up you can SSH in with  To get started on the supercomputer you need to follow the instructions and get an account from [[https://​marylou.byu.edu/​]]. Once you have this set up you can SSH in with 
Line 20: Line 22:
 The computer lab grants most memory to the **compute** directory, so from now on we will make sure to put all data and code in there. The computer lab grants most memory to the **compute** directory, so from now on we will make sure to put all data and code in there.
  
 +==== Running Programs ====
 +
 +Now that our environment is set up we can run a python script. This is done by submitting a job with whatever deep learning magic you want to run. The method for submitting this is called a SLURM script. This is just a bash script that tells the recourse manager how much memory you need, CPU's, time, and so on.
 +
 +So time for a baby example with using the tensorflow program hello.py:
 +
 +<​code>​
 +import tensorflow as tf
 +
 +hello = tf.constant('​Hello'​)
 +
 +sess = tf.Session()
 +print sess.run(hello)
 +</​code>​
 +
 +Now to create our slurm script we can use the GUI at [[https://​marylou.byu.edu/​documentation/​slurm/​script-generator]]. This will give us a file that I will name slurm.sh that looks like this
 +
 +<​code>​
 +#!/bin/bash
 +
 +#SBATCH --time=01:​00:​00 ​  # walltime
 +#SBATCH --ntasks=1 ​  # number of processor cores (i.e. tasks)
 +#SBATCH --nodes=1 ​  # number of nodes
 +#SBATCH --gres=gpu:​1
 +#SBATCH --mem-per-cpu=4096M ​  # memory per CPU core
 +
 +# LOAD MODULES, INSERT CODE, AND RUN YOUR PROGRAMS HERE
 +python hello.py
 +</​code>​
 +
 +Simple enough, right? Also it is important to make sure we tell it how much memory and time we expect. If we give it a lot we will have less priority and have to weight longer for the job to start.
 +
 +Now we just do execute ./slurm.sh to run it.
 +
 +==== Pro Tips ====
 +  * Make sure your tf code uses the GPU
 +  * to see all your jobs status its helpful to make an alias with the command `watch squeue -u<​username>​ --Format=jobid,​numcpus,​state,​timeused,​timeleft'​
supercomputer.txt · Last modified: 2021/06/30 23:42 (external edit)