Differences

This shows you the differences between two versions of the page.

--- googlecloud [2017/09/07 20:58]
humphrey [Check the results]
+++ googlecloud [2021/06/30 23:42]
@@ Line 1: / Line 1: @@
-====== Deep Learning on the Supercomputer ======
-==== Installing gcloud on a local machine ====
-. Install gcloud sdk on your local machine (I personally used window linux subsystem, therefore I chose the apt-get option)
-reference: https://cloud.google.com/sdk/downloads
-. Use the following code to set user account, set region of computation unit.
-<code> gcloud init </code>
-==== Setup google cloud storage device ====
-. On the web api (link: https://console.cloud.google.com), click on the drop down manual on the top left hand corner -> click on storage -> click on browse, and create a new storage bucket if there hasn't been one. Let's call it byu_tf_ml in this example.
-==== Submit a learning job to google cloud ====
-. On the local machine console, call the following command:
-<code> gcloud ml-engine jobs submit training my_job --package-path trainer --module-name trainer.tf_task --staging-bucket gs://byu_tf_ml --scale-tier BASIC </code>
-reference: https://cloud.google.com/sdk/gcloud/reference/ml-engine/jobs/submit/training
-This step is a bit tricky, the command "gcloud ml-engine jobs submit training" is a google cloud version to 1/package up our python machine learning job, 2/uploading that to the cloud platform and 3/ run it on some cloud machines. There are four fields required:
-  a. job: in our example, the value is my_job, it's the job id showing up in the web api after submitting the job.
-  b. package-path: the local machine directory which contains the python source code.
-  c. module-name: the main python script.
-  d. staging-bucket: the place on google cloud where the ml model is stored.
-optional:
-  e. scale-tier: this is optional, but allow a fine control on how much computation power we want to use with the project.
-  f. package-path: the path where packages you imported into the project but not listed here: https://cloud.google.com/ml-engine/docs/concepts/runtime-version-list
-  g. job-dir: an argument passed into your program to tell it which google storage directory to use. It has to be in the form of gs://[bucket_name]/[job_dir].
-hint:
-  h. In order to save and retrieve data on google cloud machines, specify the path of input and output as gs://[bucket_name]/[input files/output directory] in your code.
-==== Check the results ====
-. On the web api, click on the drop down manual on the top left hand corner -> click on ML Engine -> click on job. You should be able to see the project submitted.
-. After the training finished, you will be able to see the results and logs under corresponding job.
-==== More ====
-. If you want to reuse the trained weights of of the model, include the savedmodel function in the application.
-Reference: https://cloud.google.com/ml-engine/docs/concepts/prediction-overview
-. I haven't try out tensorbroad yet, but it seems like that it's not too bad to achieve.
-Reference: https://cloud.google.com/ml-engine/docs/how-tos/monitor-training#monitoring_with_tensorboard
-. You may want to check out more examples online. Reference: https://cloud.google.com/ml-engine/docs/tutorials/

BYU CS classes

User Tools

Site Tools

Differences

Page Tools