BYU CS classes

This is an old revision of the document!

Objective:

Explore careful hyperparameter tuning in pytorch. Gain experience and confidence in carefully comparing multiple options.

Deliverable:

For this lab, you will submit an ipython notebook via learningsuite. Your notebook will contain two parts, as described below.

Grading standards:

Your notebook will be graded on the following:

35% Part 1: Clearly displayed 10 bars (one for baseline, one for each tweak independently)
5% Part 1: Small writeup of conclusions from independent tweaks
25% Part 2: Clear explanation of your tweaking strategy
25% Part 2: Actually run your tweaking strategy and show the results
10% Tidy and legible figures, including labeled axes where appropriate
10% Extra credit - Error bars on your figure in Part 1.

Description:

The goal of this lab is to learn how to explore the combinatorial space of possible hyperparameter settings.

Many deep learning papers present some sort of tweak on standard deep learning, and empirically illustrate that it improves performance (ideally across a wide variety of architectures and datasets). It quickly becomes hard to know: which, if any, of these tweaks are truly important - and how do they work when combined?

For this lab, you will explore various tweaks to the basic classifier you coded in lab 1. There are two parts to the lab.

Part 1

You must clearly show the individual effect of each tweak compared to the baseline. For this part, you should present a simple bar chart (or possibly two or more, depending on your layout), clearly labeled with the baseline performance, and then the performance of each tweak relative to baseline. You may plot absolute or relative performances; whichever is clearer.

You must include a few sentences describing what you can conclude from evaluating all of these tweaks.

Note: I am not requiring error bars for this lab, because they are computationally intensive. I have made them extra credit – although if we were doing this for real, they would be absolutely required!

Part 2

You must think about how to find the best combination of tweaks. There is no right answer to this part; I want you to think carefully about how to search the space of possible combinations, and come up with a reasonable method for settling on a final combination of tweaks. I have tried to provide enough tweaks that it should be impossible to brute-force try all possible combinations (although that is certainly a valid strategy!).

For this part, you must include in your notebook a simple writeup describing your strategy (just a paragraph or two), and then show the final performance of whatever combination you hit upon.

Note that you will not be graded on absolute performance of any run; what is important is thinking clearly through which tweaks make a difference.

The Tweaks

Your baseline classifier must be a “vanilla” classifier, with none of the features listed below. We will systematically add them in.

You must test the following:

Activation functions: relu (baseline), leakyrelu, selu, elu, hardshrink
Batchnorm: off (baseline), on (use one batchnorm per residual block)
Label smoothing: off (baseline), on
Learning rate: constant (baseline), CLR
Regularization: off (baseline), dropout
Initialization: xavier/he (baseline), orthogonal

So, for part one, your bar chart should have 10 different bars.

Some of these tweaks require additional parameters. You should either leave them at their default values, or think of some reasonable way to set them.

Hints

Activation functions and dropout can all be found in torch.nn

Initialization functions can be found in torch.nn.init

This lab should be pretty straightforward, with the right script – you should be able to iterate over tweaks and run your classifier in a tidy loop. Ideally, you'll code it up, let it run, and come back in a few hours to find the results!

If you find yourself cutting-and-pasting, you might want to rethink your strategy.

BYU CS classes

User Tools

Site Tools

Table of Contents