This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
supercomputer [2017/08/23 20:22] sean [Deep Learning on the Supercomputer] |
supercomputer [2017/10/10 17:09] wingated |
||
---|---|---|---|
Line 19: | Line 19: | ||
module add tensorflow/0.9.0_python-2.7.11+cuda | module add tensorflow/0.9.0_python-2.7.11+cuda | ||
</code> | </code> | ||
+ | |||
+ | **UPDATE: apparently, the following module file works better:** | ||
+ | |||
+ | <code> | ||
+ | #%Module | ||
+ | |||
+ | module load defaultenv | ||
+ | module load cuda/8.0 | ||
+ | module load cudnn/5.1_cuda-8.0 | ||
+ | module load python/2/7 | ||
+ | |||
+ | </code> | ||
+ | |||
The computer lab grants most memory to the **compute** directory, so from now on we will make sure to put all data and code in there. | The computer lab grants most memory to the **compute** directory, so from now on we will make sure to put all data and code in there. | ||
Line 42: | Line 55: | ||
#!/bin/bash | #!/bin/bash | ||
- | #SBATCH --time=01:00:00 # walltime | + | #SBATCH --time=01:00:00 # walltime - this is one hour |
#SBATCH --ntasks=1 # number of processor cores (i.e. tasks) | #SBATCH --ntasks=1 # number of processor cores (i.e. tasks) | ||
#SBATCH --nodes=1 # number of nodes | #SBATCH --nodes=1 # number of nodes | ||
Line 54: | Line 67: | ||
Simple enough, right? Also it is important to make sure we tell it how much memory and time we expect. If we give it a lot we will have less priority and have to weight longer for the job to start. | Simple enough, right? Also it is important to make sure we tell it how much memory and time we expect. If we give it a lot we will have less priority and have to weight longer for the job to start. | ||
- | Now we just do execute ./slurm.sh to run it. | + | To submit your job, use the ''sbatch'' command, as in ''sbatch ./slurm.sh''. |
==== Pro Tips ==== | ==== Pro Tips ==== | ||
* Make sure your tf code uses the GPU | * Make sure your tf code uses the GPU | ||
* to see all your jobs status its helpful to make an alias with the command `watch squeue -u<username> --Format=jobid,numcpus,state,timeused,timeleft' | * to see all your jobs status its helpful to make an alias with the command `watch squeue -u<username> --Format=jobid,numcpus,state,timeused,timeleft' |