User Tools

Site Tools


cs501r_f2016:lab2

This is an old revision of the document!


Objective:

To gain experience with python, numpy, and linear classification.

Oh, and to remember all of that linear algebra stuff. ;)


Deliverable:

You should turn in an iPython notebook that implements the perceptron algorithm on two different datasets: the Iris dataset, and the CIFAR-10 dataset. Because the perceptron is a binary classifier, we will preprocess the data and “squash” it to create two classes.

Your notebook should also generate a visualization that shows classification accuracy at each iteration, along with the log of the l2 norm of the weight vector. Examples of both are shown at the right. Please note that you should cleanly label your axes!

The Iris dataset can be downloaded at the UCI ML repository, or you can download a slightly simpler version here: http://liftothers.org/Fisher.csv

The CIFAR-10 dataset can be downloaded at https://www.cs.toronto.edu/~kriz/cifar.html

Note: make sure to download the python version of the data - it will simplify your life!


Grading standards:

Your notebook will be graded on the following:

  • 70% Correct implementation of perceptron algorithm
  • 20% Tidy and legible visualization of loss function
  • 10% Tidy and legible plot of classification accuracy over time

Description:

The purpose of this lab is to help you become familiar with numpy, to remember the basics of classification, and to implement the perceptron algorithm. The perceptron algorithm is a simple method of learning a separating hyperplane. It is guaranteed to converge iff the dataset is linearly separable - otherwise, there is no guarantee!

You should implement the perceptron algorithm according to the description in Wikipedia:

Perceptron

As you implement this lab, you will (hopefully!) learn the difference between numpy's matrices, numpy's vectors, and lists. In particular, note that a list is not the same a vector, and a n x 1 matrix is not the same as a vector of length n.

You may find the functions np.asmatrix, np.atleast_2d, and np.reshape helpful to convert between them.

Also, you may find the function np.dot helpful to compute matrix-vector products, or vector-vector products. You can transpose a matrix or a vector by calling the .T method.

Preparing the data:

We need to convert both datasets to binary classification problems. To show you how we're going to do this, and to give you a bit of code to get started, here is how I loaded and converted the Iris dataset:

data = pandas.read_csv( 'Fisher.csv' )
m = data.as_matrix()
labels = m[:,0]
labels[ labels==2 ] = 1
labels = np.atleast_2d( labels ).T
features = m[:,1:5]

and the CIFAR-10 dataset:

def unpickle( file ):
    import cPickle
    fo = open(file, 'rb')
    dict = cPickle.load(fo)
    fo.close()
    return dict
 
data = unpickle( 'cifar-10-batches-py/data_batch_1' )
 
features = data['data']
labels = data['labels']
labels = np.atleast_2d( labels ).T
 
# squash classes 0-4 into class 0, and squash classes 5-9 into class 1
labels[ labels < 5 ] = 0
labels[ labels >= 5 ] = 1

Running the perceptron algorithm

Remember that if a data instance is classified correctly, there is no change in the weight vector.

In the wikipedia description of the perceptron algorithm, notice the function f. That's the Heaviside step function. What does it do?

Computing the l2 norm of the weight vector

This should only take a single line of code. Hint: can you rewrite the l2 norm in terms of dot products?


Hints:

An easy way to load a CSV datafile is with the pandas package.

Here are some functions that may be helpful to you:

np.random.randn
 
import matplotlib.pyplot as plt
 
plt.figure
plt.xlabel
plt.ylabel
plt.legend
plt.show
cs501r_f2016/lab2.1472790630.txt.gz · Last modified: 2021/06/30 23:40 (external edit)