User Tools

Site Tools



This shows you the differences between two versions of the page.

Link to this comparison view

googlecloud [2017/09/07 20:57]
humphrey [Submit a learning job to google cloud]
googlecloud [2021/06/30 23:42]
Line 1: Line 1:
-====== Deep Learning on the Supercomputer ====== 
-==== Installing gcloud on a local machine ==== 
-1. Install gcloud sdk on your local machine (I personally used window linux subsystem, therefore I chose the apt-get option) 
-reference: https://​​sdk/​downloads 
-2. Use the following code to set user account, set region of computation unit. 
-<​code>​ gcloud init </​code>​ 
-==== Setup google cloud storage device ==== 
-3. On the web api (link: https://​,​ click on the drop down manual on the top left hand corner -> click on storage -> click on browse, and create a new storage bucket if there hasn't been one. Let's call it byu_tf_ml in this example. 
-==== Submit a learning job to google cloud ==== 
-4. On the local machine console, call the following command: 
-<​code>​ gcloud ml-engine jobs submit training my_job --package-path trainer --module-name trainer.tf_task --staging-bucket gs://​byu_tf_ml --scale-tier BASIC </​code>​ 
-reference: https://​​sdk/​gcloud/​reference/​ml-engine/​jobs/​submit/​training 
-This step is a bit tricky, the command "​gcloud ml-engine jobs submit training"​ is a google cloud version to 1/package up our python machine learning job, 2/uploading that to the cloud platform and 3/ run it on some cloud machines. There are four fields required: 
-  a. job: in our example, the value is my_job, it's the job id showing up in the web api after submitting the job. 
-  b. package-path:​ the local machine directory which contains the python source code. 
-  c. module-name:​ the main python script. 
-  d. staging-bucket:​ the place on google cloud where the ml model is stored. 
-  e. scale-tier: this is optional, but allow a fine control on how much computation power we want to use with the project. 
-  f. package-path:​ the path where packages you imported into the project but not listed here: https://​​ml-engine/​docs/​concepts/​runtime-version-list 
-  g. job-dir: an argument passed into your program to tell it which google storage directory to use. It has to be in the form of gs://​[bucket_name]/​[job_dir]. 
-  ​ 
-  h. In order to save and retrieve data on google cloud machines, specify the path of input and output path as gs://​[bucket_name]/​[input files/​output directory] in your code. 
-==== Check the results ==== 
-5. On the web api, click on the drop down manual on the top left hand corner -> click on ML Engine -> click on job. You should be able to see the project submitted. 
-6. After the training finished, you will be able to see the results and logs under corresponding job. 
-==== Check the results ==== 
-7. If you want to reuse the trained weights of of the model, include the savedmodel function in the application. 
-Reference: https://​​ml-engine/​docs/​concepts/​prediction-overview 
-8. I haven'​t try out tensorbroad yet, but it seems like that it's not too bad to achieve. 
-Reference: https://​​ml-engine/​docs/​how-tos/​monitor-training#​monitoring_with_tensorboard 
-9. You may want to check out more examples online. Reference: https://​​ml-engine/​docs/​tutorials/​ 
googlecloud.txt ยท Last modified: 2021/06/30 23:42 (external edit)