Differences

This shows you the differences between two versions of the page.

--- cs501r_f2016:lab3 [2016/09/09 17:10]
wingated
+++ cs501r_f2016:lab3 [2021/06/30 23:42] (current)
@@ Line 38: / Line 38: @@
   * 10% Tidy and legible visualization of cost function
   * 10% Tidy and legible plot of classification accuracy over time
+  * +5% Complete auto gradient decent lab
 ----
@@ Line 57: / Line 58: @@
 </code>
-Note a couple of things about this code: first, it is fully vectorized.  Second, the ''numerical_gradient'' function accepts a parameter called ''loss_function'' -- ''numerical_gradient'' is a higher-order function that accepts another function as an input.  This numerical gradient calculator could be used to calculate gradients for any function.
+Note a couple of things about this code: first, it is fully vectorized.  Second, the ''numerical_gradient'' function accepts a parameter called ''loss_function'' -- ''numerical_gradient'' is a higher-order function that accepts another function as an input.  This numerical gradient calculator could be used to calculate gradients for any function. Third, you may wonder why my ''loss_function'' doesn't need the data!  Since the data never changes, I curried it into my loss function, resulting in a function that only takes one parameter -- the matrix ''W''.
-You should run your code for 1000 epochs.
+You should run your code for 1000 epochs. (Here, by epoch, I mean "step in the gradient descent algorithm.").  Note, however, that for each step, you have to calculate the gradient, and in order to calculate the gradient, you will need to evaluate the loss function many times.
-You should plot both the loss function and the classification accuracy.
+You should plot both the loss function and the classification accuracy at each step.
 **Preparing the data:**
@@ Line 105: / Line 106: @@
 You should use a linear score function, as discussed in class.  This should only be one line of code!
-You should use the log softmax loss function, as discussed in class.  For each training instance, you should compute the probability that the instance is classified as class ''k'', using ''p(instance i = class k) = exp( s_ik ) / sum_j exp( s_ij )'' (where ''s_ij'' is the score of the i'th instance on the j'th class) and then calculate ''L_i'' as the log of the probability of the correct class.
+You should use the log softmax loss function, as discussed in class.  For each training instance, you should compute the probability that the instance ''i'' is classified as class ''k'', using ''p(instance i = class k) = exp( s_ik ) / sum_j exp( s_ij )'' (where ''s_ij'' is the score of the i'th instance on the j'th class), and then calculate ''L_i'' as the log of the probability of the correct class.  Your overall loss is then the mean of the individual ''L_i'' terms.
 **Note: you should be careful about numerical underflow!** To help combat that, you should use the **log-sum-exp** trick (or the **exp-normalize** trick):
@@ Line 121: / Line 122: @@
 I used a delta of 0.000001.
-Please feel free to search around online for resources to
+Please feel free to search around online for resources to understand this better.  For example:
 [[http://www2.math.umd.edu/~dlevy/classes/amsc466/lecture-notes/differentiation-chap.pdf|These lecture notes]] (see eq. 5.1)
@@ Line 148: / Line 149: @@
 You may find [[http://matplotlib.org/users/pyplot_tutorial.html|this tutorial on pyplot]] helpful.
+----
+====Extra credit:====
+You may complete the old lab 04 for 5% of extra credits. [[http://liftothers.org/dokuwiki/doku.php?id=cs501r_f2016:lab4]]

BYU CS classes

User Tools

Site Tools

Differences

Page Tools