This is an old revision of the document!
For this lab, you will turn in a colab notebook that implements the proximal policy optimization (PPO) algorithm.
Your notebook will be graded on the following:
For this lab, you will implement the PPO algorithm, and train it on a few simple worlds.
Here is a blog post introducing the idea.
Here is the paper with a technical description of the algorithm: Proximal policy optimization.
Here is a video describing it at a high level: PPO video