cs501r_f2018:lab9 [BYU CS classes]

cs501r_f2018:lab9

This is an old revision of the document!

Table of Contents

Objective:
Deliverable:
Grading standards:
Description:

Objective:

To implement the Proximal Policy Optimization algorithm

Deliverable:

For this lab, you will turn in a colab notebook that implements the proximal policy optimization (PPO) algorithm.

Grading standards:

Your notebook will be graded on the following:

45% Proper design, creation and debugging of an actor and critic networks
25% Proper implementation of the PPO loss function and objective on cart-pole
20% Implementation and demonstrated learning of PPO on another domain of your choice
10% Visualization of policy return as a function of training

Description:

For this lab, you will implement the PPO algorithm, and train it on a few simple worlds from the OpenAI gym test suite of problems.

You may use any code you want from the internet to help you understand how to implement this, but all final code must be your own.

Here is the OpenAI gym worlds

Here is a blog post introducing the idea.

Here is the paper with a technical description of the algorithm .

Here is a video describing it at a high level .

cs501r_f2018/lab9.1541800304.txt.gz · Last modified: 2021/06/30 23:40 (external edit)