User Tools

Site Tools


cs501r_f2018:lab9

This is an old revision of the document!


Objective:

  • To implement the Proximal Policy Optimization algorithm

Deliverable:

For this lab, you will turn in a colab notebook that implements the proximal policy optimization (PPO) algorithm.


Grading standards:

Your notebook will be graded on the following:

  • 45% Proper design, creation and debugging of an actor and critic networks
  • 45% Proper implementation of the PPO loss function and objective
  • 10% Visualization of policy return as a function of training

Description:

For this lab, you will implement the PPO algorithm, and train it on a few simple worlds.

Here is the paper with a technical description of the algorithm: Proximal policy optimization.

Here is a video describing it at a high level: PPO video

cs501r_f2018/lab9.1541799574.txt.gz · Last modified: 2021/06/30 23:40 (external edit)