Differences

This shows you the differences between two versions of the page.

--- supercomputer [2017/08/23 20:00]
sean created
+++ supercomputer [2021/06/30 23:42] (current)
@@ Line 1: / Line 1: @@
 ====== Deep Learning on the Supercomputer ======
-==== Intro to the Supercomputer ====
+==== Setting Up Supercomputer ====
 To get started on the supercomputer you need to follow the instructions and get an account from [[https://marylou.byu.edu/]]. Once you have this set up you can SSH in with
@@ Line 17: / Line 19: @@
 module add tensorflow/0.9.0_python-2.7.11+cuda
 </code>
+**UPDATE: apparently, the following module file works better:**
+<code>
+#%Module
+module load defaultenv
+module load cuda/8.0
+module load cudnn/5.1_cuda-8.0
+module load python/2/7
+</code>
 The computer lab grants most memory to the **compute** directory, so from now on we will make sure to put all data and code in there.
+==== Running Programs ====
+Now that our environment is set up we can run a python script. This is done by submitting a job with whatever deep learning magic you want to run. The method for submitting this is called a SLURM script. This is just a bash script that tells the recourse manager how much memory you need, CPU's, time, and so on.
+So time for a baby example with using the tensorflow program hello.py:
+<code>
+import tensorflow as tf
+hello = tf.constant('Hello')
+sess = tf.Session()
+print sess.run(hello)
+</code>
+Now to create our slurm script we can use the GUI at [[https://marylou.byu.edu/documentation/slurm/script-generator]]. This will give us a file that I will name slurm.sh that looks like this
+<code>
+#!/bin/bash
+#SBATCH --time=01:00:00   # walltime - this is one hour
+#SBATCH --ntasks=1   # number of processor cores (i.e. tasks)
+#SBATCH --nodes=1   # number of nodes
+#SBATCH --gres=gpu:1
+#SBATCH --mem-per-cpu=4096M   # memory per CPU core
+# LOAD MODULES, INSERT CODE, AND RUN YOUR PROGRAMS HERE
+python hello.py
+</code>
+Simple enough, right? Also it is important to make sure we tell it how much memory and time we expect. If we give it a lot we will have less priority and have to weight longer for the job to start.
+To submit your job, use the ''sbatch'' command, as in ''sbatch ./slurm.sh''.
+==== Pro Tips ====
+  * Make sure your tf code uses the GPU
+  * to see all your jobs status its helpful to make an alias with the command `watch squeue -u<username> --Format=jobid,numcpus,state,timeused,timeleft'

BYU CS classes

User Tools

Site Tools

Differences

Page Tools