User Tools

Site Tools


cs501r_f2016:lab2

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
cs501r_f2016:lab2 [2016/09/02 04:28]
wingated
cs501r_f2016:lab2 [2016/09/02 04:45]
wingated
Line 8: Line 8:
 ====Deliverable:​==== ====Deliverable:​====
  
-You should turn in an iPython notebook that implements the perceptron algorithm on two different datasets: the Iris dataset, and the CIFAR-10 dataset. ​ Because the perceptron is a binary classifier, we will preprocess the data to create two classes.+You should turn in an iPython notebook that implements the perceptron algorithm on two different datasets: the Iris dataset, and the CIFAR-10 dataset. ​ Because the perceptron is a binary classifier, we will preprocess the data and "​squash"​ it to create two classes. 
 + 
 +Your notebook should also generate a visualization that shows classification accuracy at each iteration, along with the log of the l2 norm of the weight vector, for two different values of the perceptron'​s step size.  Examples of both are shown at the right. ​ Since there are two datasets, and there are two visualizations per dataset, your notebook should produce a total of 4 plots. 
 + 
 +**Please cleanly label your axes!**
  
-Your notebook should also generate a visualization that shows classification accuracy at each iteration, along with the log of the l2 norm of the weight vector. ​ Examples of both are shown at the right. ​ **Please note that you should cleanly label your axes!** 
 {{ :​cs501r_f2016:​lab2_cacc.png?​direct&​200|}} {{ :​cs501r_f2016:​lab2_cacc.png?​direct&​200|}}
 +
 +{{ :​cs501r_f2016:​lab2_l2norm.png?​direct&​200|}}
  
 The Iris dataset can be downloaded at the UCI ML repository, or you can download a slightly simpler version here: The Iris dataset can be downloaded at the UCI ML repository, or you can download a slightly simpler version here:
Line 33: Line 38:
 ====Description:​==== ====Description:​====
  
-The purpose of this lab is to help you become familiar with ''​numpy'',​ to remember the basics of classification,​ and to implement the perceptron algorithm. ​ The perceptron algorithm is a simple method of learning a separating hyperplane. ​ It is guaranteed to converge iff the dataset is linearly separable - otherwise, ​there is no guarantee!+The purpose of this lab is to help you become familiar with ''​numpy'',​ to remember the basics of classification,​ and to implement the perceptron algorithm. ​ The perceptron algorithm is a simple method of learning a separating hyperplane. ​ It is guaranteed to converge iff the dataset is linearly separable - otherwise, ​you have to cross your fingers!
  
 You should implement the perceptron algorithm according to the description in Wikipedia: You should implement the perceptron algorithm according to the description in Wikipedia:
Line 44: Line 49:
  
 Also, you may find the function ''​np.dot''​ helpful to compute matrix-vector products, or vector-vector products. You can transpose a matrix or a vector by calling the ''​.T''​ method. Also, you may find the function ''​np.dot''​ helpful to compute matrix-vector products, or vector-vector products. You can transpose a matrix or a vector by calling the ''​.T''​ method.
 +
 +Hint: you should start with the Iris dataset, then once you have your perceptron working, you should move to the CIFAR-10 dataset.
  
 **Preparing the data:** **Preparing the data:**
  
-We need to convert ​both datasets ​to binary classification problems. ​ To show you how we're going to do this, and to give you a bit of code to get started, here is how I loaded and converted the Iris dataset:+Both datasets are natively multiclass, but we need to convert ​them to binary classification problems. ​ To show you how we're going to do this, and to give you a bit of code to get started, here is how I loaded and converted the Iris dataset:
  
 <code python> <code python>
Line 53: Line 60:
 m = data.as_matrix() m = data.as_matrix()
 labels = m[:,0] labels = m[:,0]
-labels[ labels==2 ] = 1+labels[ labels==2 ] = 1  # squash class 2 into class 1
 labels = np.atleast_2d( labels ).T labels = np.atleast_2d( labels ).T
 features = m[:,1:5] features = m[:,1:5]
Line 85: Line 92:
  
 In the wikipedia description of the perceptron algorithm, notice the function ''​f''​. ​ That's the Heaviside step function. ​ What does it do? In the wikipedia description of the perceptron algorithm, notice the function ''​f''​. ​ That's the Heaviside step function. ​ What does it do?
 +
 +You should run the perceptron for at least 100 steps.
 +
 +You should also test different step sizes. Wikipedia doesn'​t discuss how to do this, but it should be straightforward for you to figure out; the algorithm description in the lecture notes includes the step size.  (But try to figure it out: consider the update equation for a weight, and ask yourself: where should I put a stepsize parameter, to be able to adjust the magnitude of the weight update?​) ​
 +
 +For the Iris dataset, you should test at least ''​c=1'',​ ''​c=0.1'',​ ''​c=0.01''​.
 +
 +For the CIFAR-10 dataset, you should test at least ''​c=0.001'',​ ''​c=0.00001''​.
  
  
 ** Computing the l2 norm of the weight vector ** ** Computing the l2 norm of the weight vector **
 +
 +It is interesting to watch the weight vector as the algorithm progresses.  ​
  
 This should only take a single line of code.  Hint: can you rewrite the l2 norm in terms of dot products? This should only take a single line of code.  Hint: can you rewrite the l2 norm in terms of dot products?
 +
 +** Plotting results **
 +
 +You may use any notebook compatible plotting function you like, but I recommend ''​matplotlib''​. ​ This is commonly imported as
 +
 +<code python>
 +import matplotlib.pyplot as plt
 +</​code>​
 +
 +To create a new figure, call ''​plt.figure''​. ​ To plot a line, call ''​plt.plot''​. ​ Note that if you pass a matrix into ''​plt.plot'',​ it will plot multiple lines at once, each with a different color; each column will generate a new line.
 +
 +Note that if you use matplotlib, you may have to call ''​plt.show''​ to actually construct and display the plot.
 +
 +Don't forget to label your axes!
 +
 +You may find [[http://​matplotlib.org/​users/​pyplot_tutorial.html|this tutorial on pyplot]] helpful.
  
 ---- ----
Line 105: Line 138:
  
 plt.figure plt.figure
 +plt.plot
 plt.xlabel plt.xlabel
 plt.ylabel plt.ylabel
cs501r_f2016/lab2.txt · Last modified: 2021/06/30 23:42 (external edit)