User Tools

Site Tools


supercomputer

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

supercomputer [2017/10/10 17:09]
wingated
supercomputer [2021/06/30 23:42]
Line 1: Line 1:
-====== Deep Learning on the Supercomputer ====== 
  
- 
- 
-==== Setting Up Supercomputer ==== 
- 
-To get started on the supercomputer you need to follow the instructions and get an account from [[https://​marylou.byu.edu/​]]. Once you have this set up you can SSH in with  
- 
-<​code>​ ssh <​username>​@ssh.fsl.byu.edu </​code>​ 
- 
-Welcome to our new home directory. Using the supercomputer means we have to remember elementary school and be nice and share. This means we can only used software that is approved and stored in **modules**. For this class we want to use python, tensorflow, and cuda for the GPU. We setup our environment by creating a file called .modules and telling it what we want to use. 
- 
-<​code>​ 
-#%Module 
- 
-module load defaultenv 
-module add cuda 
-module add cudnn/​4.0_gcc-4.4.7 
-module add tensorflow/​0.9.0_python-2.7.11+cuda 
-</​code>​ 
- 
-**UPDATE: apparently, the following module file works better:** 
- 
-<​code>​ 
-#%Module 
- 
-module load defaultenv 
-module load cuda/8.0 
-module load cudnn/​5.1_cuda-8.0 
-module load python/2/7 
- 
-</​code>​ 
- 
- 
-The computer lab grants most memory to the **compute** directory, so from now on we will make sure to put all data and code in there. 
- 
-==== Running Programs ==== 
- 
-Now that our environment is set up we can run a python script. This is done by submitting a job with whatever deep learning magic you want to run. The method for submitting this is called a SLURM script. This is just a bash script that tells the recourse manager how much memory you need, CPU's, time, and so on. 
- 
-So time for a baby example with using the tensorflow program hello.py: 
- 
-<​code>​ 
-import tensorflow as tf 
- 
-hello = tf.constant('​Hello'​) 
- 
-sess = tf.Session() 
-print sess.run(hello) 
-</​code>​ 
- 
-Now to create our slurm script we can use the GUI at [[https://​marylou.byu.edu/​documentation/​slurm/​script-generator]]. This will give us a file that I will name slurm.sh that looks like this 
- 
-<​code>​ 
-#!/bin/bash 
- 
-#SBATCH --time=01:​00:​00 ​  # walltime - this is one hour 
-#SBATCH --ntasks=1 ​  # number of processor cores (i.e. tasks) 
-#SBATCH --nodes=1 ​  # number of nodes 
-#SBATCH --gres=gpu:​1 
-#SBATCH --mem-per-cpu=4096M ​  # memory per CPU core 
- 
-# LOAD MODULES, INSERT CODE, AND RUN YOUR PROGRAMS HERE 
-python hello.py 
-</​code>​ 
- 
-Simple enough, right? Also it is important to make sure we tell it how much memory and time we expect. If we give it a lot we will have less priority and have to weight longer for the job to start. 
- 
-To submit your job, use the ''​sbatch''​ command, as in ''​sbatch ./​slurm.sh''​. 
- 
-==== Pro Tips ==== 
-  * Make sure your tf code uses the GPU 
-  * to see all your jobs status its helpful to make an alias with the command `watch squeue -u<​username>​ --Format=jobid,​numcpus,​state,​timeused,​timeleft'​ 
supercomputer.txt ยท Last modified: 2021/06/30 23:42 (external edit)