cs501r_f2018:lab9 [BYU CS classes]

cs501r_f2018:lab9

This is an old revision of the document!

Table of Contents

Objective:
Deliverable:
Grading standards:
Description:

Objective:

To implement the Proximal Policy Optimization algorithm

Deliverable:

For this lab, you will turn in a colab notebook that implements the proximal policy optimization (PPO) algorithm.

Grading standards:

Your notebook will be graded on the following:

45% Proper design, creation and debugging of an actor and critic networks
45% Proper implementation of the PPO loss function and objective
10% Visualization of policy return as a function of training

Description:

For this lab, you will implement the PPO algorithm, and train it on a few simple worlds.

Here is a blog post introducing the idea.

Here is the paper with a technical description of the algorithm: Proximal policy optimization.

Here is a video describing it at a high level: PPO video

cs501r_f2018/lab9.1541799668.txt.gz · Last modified: 2021/06/30 23:40 (external edit)