User Tools

Site Tools


googlecloud

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
googlecloud [2017/09/07 21:03]
humphrey [Check the results]
googlecloud [2021/06/30 23:42] (current)
Line 1: Line 1:
-====== Deep Learning on the Supercomputer ​======+====== Deep Learning on the Google Cloud Platform ​====== 
  
 ==== Installing gcloud on a local machine ==== ==== Installing gcloud on a local machine ====
Line 6: Line 7:
 reference: https://​cloud.google.com/​sdk/​downloads reference: https://​cloud.google.com/​sdk/​downloads
  
-2. Use the following code to set user account, ​set region of computation unit.+2. Use the following code to setup user account, ​including setting ​region of computation unit.
  
 <​code>​ gcloud init </​code>​ <​code>​ gcloud init </​code>​
  
 +----
 ==== Setup google cloud storage device ==== ==== Setup google cloud storage device ====
  
 3. On the web api (link: https://​console.cloud.google.com),​ click on the drop down manual on the top left hand corner -> click on storage -> click on browse, and create a new storage bucket if there hasn't been one. Let's call it byu_tf_ml in this example. 3. On the web api (link: https://​console.cloud.google.com),​ click on the drop down manual on the top left hand corner -> click on storage -> click on browse, and create a new storage bucket if there hasn't been one. Let's call it byu_tf_ml in this example.
  
 +----
 ==== Submit a learning job to google cloud ==== ==== Submit a learning job to google cloud ====
  
 4. On the local machine console, call the following command: 4. On the local machine console, call the following command:
- +<​code> ​ 
-<​code>​ gcloud ml-engine jobs submit training ​my_job ​--package-path ​trainer ​--module-name ​trainer.tf_task ​--staging-bucket gs://byu_tf_ml ​--scale-tier ​BASIC </​code>​+gcloud ml-engine jobs submit training ​ 
 + [job_id] 
 + --package-path ​[scource_dir]  
 + --module-name ​[scource_dir.main_script]  
 + --staging-bucket gs://[save_to_directory] 
 + --scale-tier ​BASIC_GPU 
 +</​code>​
  
 reference: https://​cloud.google.com/​sdk/​gcloud/​reference/​ml-engine/​jobs/​submit/​training reference: https://​cloud.google.com/​sdk/​gcloud/​reference/​ml-engine/​jobs/​submit/​training
  
-This step is a bit tricky, the command "​gcloud ml-engine jobs submit training"​ is a google cloud version to 1/package up our python machine learning job, 2/uploading that to the cloud platform and 3/ run it on some cloud machines. There are four fields required:+The command "​gcloud ml-engine jobs submit training"​ is a google cloud version to package up our python machine learning job, uploading that to the cloud platform and then run it on some cloud machines. There four fields ​are required:
  
-  ​a. job: in our example, the value is my_job, it's the job id showing up in the web api after submitting the job. + a. job: In our example, the value is [job_id] in the example, it's the job id showing up in the web api after submitting the job. 
-  b. package-path: ​the local machine directory which contains the python source code. +  ​ 
-  c. module-name: ​the main python script. + b. package-path: ​The local machine directory which contains the python source code. 
-  d. staging-bucket: ​the place on google cloud where the ml model is stored.+  ​ 
 + c. module-name: ​The main python script. 
 +  ​ 
 + d. staging-bucket: ​The place on google cloud in which the machine learning source code is stored.
  
-optional:+and these are optional:
  
-  ​e. scale-tier: ​this is optionalbut allow a fine control on how much computation power we want to use with the project+ e. scale-tier: ​There are BASICBASIC_GPU, PREMIUM_1, STANDARD_1, CUSTOM, five different tiers, standing for different level of resources ​to be used
-  f. package-paththe path where packages you imported into the project but not listed here: https://​cloud.google.com/​ml-engine/​docs/​concepts/​runtime-version-list +  
-  g. job-dir: an argument passed into your program to tell it which google storage directory to use. It has to be in the form of gs://​[bucket_name]/​[job_dir]. + f. packagesThe path where packages you imported into the project but not listed here: https://​cloud.google.com/​ml-engine/​docs/​concepts/​runtime-version-list
-   +
-hint: +
-  h. In order to save and retrieve data on google cloud machines, specify the path of input and output as gs://​[bucket_name]/​[input files/​output directory] in your code.+
  
 +----
 +===Hints:​===
 + 
 +1. In order to save and retrieve data on google cloud machines, specify the path of input and output as <​code>​gs://​[bucket_name]/​[io_directory]</​code>​
 +
 +2. Python cannot realize "​gs:"​ prefix, we need to use "​tf.gfile.Open"​ to import file, for example:
 +<code python>
 +def unpickle( file ):
 + import cPickle
 + with tf.gfile.Open(file,​ '​rb'​) as fo:
 + dic = cPickle.load(fo)
 + return dic
 +</​code>​
 +
 +3. You need a <​code>​__init__.py</​code>​ file in your source code directory in order to make it work. You can keep it as an empty file.
 +
 +----
 ==== Check the results ==== ==== Check the results ====
  
-5. On the web api, click on the drop down manual on the top left hand corner -> click on ML Engine -> click on job. You should be able to see the project submitted.+1. On the web api, click on the drop down manual on the top left hand corner -> click on ML Engine -> click on job. You should be able to see the project submitted.
  
-6. After the training finished, you will be able to see the results and logs under corresponding job. And if there are any outputs from your job, they should have been stored in the corresponding folder under gs://​[bucket_name]/​[output_path]+2. After the training finished, you will be able to see the results and logs under corresponding job. And if there are any outputs from your job, they should have been stored in the corresponding folder under the gs path.
  
-==== More ==== +3Run the following command ​in Cloud Shell to start TensorBoard
-7If you want to reuse the trained weights of of the model, include the savedmodel function ​in the application+<​code>​ tensorboard --port 8080 --logdir gs://[bucket_name]/[output_directory] </code>
-Reference: https://cloud.google.com/ml-engine/docs/​concepts/​prediction-overview+
  
-8I haven'​t try out tensorbroad yetbut it seems like that it's not too bad to achieve+4To open a new browser windowselect Preview on port 8080 from the Web preview menu in the top-right corner of the Cloud Shell toolbar. 
-Reference: https://​cloud.google.com/​ml-engine/​docs/​how-tos/monitor-training#​monitoring_with_tensorboard+ 
 +---- 
 +==== Related resources ==== 
 +1. If you want to reuse the trained weights of of the model, include the savedmodel function in the application
 +Reference: https://​cloud.google.com/​ml-engine/​docs/​concepts/prediction-overview
  
-9. You may want to check out more examples online. Reference: https://​cloud.google.com/​ml-engine/​docs/tutorials/+2. You may want to check out more examples online. Reference: https://​cloud.google.com/​sdk/​gcloud/​reference/ml-engine/jobs/submit/training
googlecloud.1504818229.txt.gz · Last modified: 2021/06/30 23:40 (external edit)