This shows you the differences between two versions of the page.
googlecloud [2017/09/07 20:58] humphrey [Check the results] |
googlecloud [2021/06/30 23:42] |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Deep Learning on the Supercomputer ====== | ||
- | ==== Installing gcloud on a local machine ==== | ||
- | |||
- | 1. Install gcloud sdk on your local machine (I personally used window linux subsystem, therefore I chose the apt-get option) | ||
- | reference: https://cloud.google.com/sdk/downloads | ||
- | |||
- | 2. Use the following code to set user account, set region of computation unit. | ||
- | |||
- | <code> gcloud init </code> | ||
- | |||
- | ==== Setup google cloud storage device ==== | ||
- | |||
- | 3. On the web api (link: https://console.cloud.google.com), click on the drop down manual on the top left hand corner -> click on storage -> click on browse, and create a new storage bucket if there hasn't been one. Let's call it byu_tf_ml in this example. | ||
- | |||
- | ==== Submit a learning job to google cloud ==== | ||
- | |||
- | 4. On the local machine console, call the following command: | ||
- | |||
- | <code> gcloud ml-engine jobs submit training my_job --package-path trainer --module-name trainer.tf_task --staging-bucket gs://byu_tf_ml --scale-tier BASIC </code> | ||
- | |||
- | reference: https://cloud.google.com/sdk/gcloud/reference/ml-engine/jobs/submit/training | ||
- | |||
- | This step is a bit tricky, the command "gcloud ml-engine jobs submit training" is a google cloud version to 1/package up our python machine learning job, 2/uploading that to the cloud platform and 3/ run it on some cloud machines. There are four fields required: | ||
- | |||
- | a. job: in our example, the value is my_job, it's the job id showing up in the web api after submitting the job. | ||
- | b. package-path: the local machine directory which contains the python source code. | ||
- | c. module-name: the main python script. | ||
- | d. staging-bucket: the place on google cloud where the ml model is stored. | ||
- | |||
- | optional: | ||
- | |||
- | e. scale-tier: this is optional, but allow a fine control on how much computation power we want to use with the project. | ||
- | f. package-path: the path where packages you imported into the project but not listed here: https://cloud.google.com/ml-engine/docs/concepts/runtime-version-list | ||
- | g. job-dir: an argument passed into your program to tell it which google storage directory to use. It has to be in the form of gs://[bucket_name]/[job_dir]. | ||
- | | ||
- | hint: | ||
- | h. In order to save and retrieve data on google cloud machines, specify the path of input and output as gs://[bucket_name]/[input files/output directory] in your code. | ||
- | |||
- | ==== Check the results ==== | ||
- | |||
- | 5. On the web api, click on the drop down manual on the top left hand corner -> click on ML Engine -> click on job. You should be able to see the project submitted. | ||
- | |||
- | 6. After the training finished, you will be able to see the results and logs under corresponding job. | ||
- | |||
- | ==== More ==== | ||
- | 7. If you want to reuse the trained weights of of the model, include the savedmodel function in the application. | ||
- | Reference: https://cloud.google.com/ml-engine/docs/concepts/prediction-overview | ||
- | |||
- | 8. I haven't try out tensorbroad yet, but it seems like that it's not too bad to achieve. | ||
- | Reference: https://cloud.google.com/ml-engine/docs/how-tos/monitor-training#monitoring_with_tensorboard | ||
- | |||
- | 9. You may want to check out more examples online. Reference: https://cloud.google.com/ml-engine/docs/tutorials/ |