User Tools

Site Tools


cs501r_f2016:lab2

This is an old revision of the document!


Objective:

To gain experience with python, numpy, and linear classification. Oh, and to remember all of that linear algebra stuff. ;)


Deliverable:

You should turn in an iPython notebook that implements the perceptron algorithm on the Iris dataset.

Your notebook should also generate a visualization that shows the loss function at each iteration. This can be generated as a single plot, and shown in the notebook.

The dataset can be downloaded at The UCI ML repository


Grading standards:

Your notebook will be graded on the following:

  • 70% Correct implementation of perceptron algorithm
  • 20% Tidy and legible visualization of loss function
  • 10% Tidy and legible final classification rate

Description:

For this lab, you will be experimenting with Kernel Density Estimators (see MLAPP 14.7.2). These are a simple, nonparametric alternative to Gaussian mixture models, but which form an important part of the machine learning toolkit.

At several points during this lab, you will need to construct density estimates that are “class-conditional”. For example, in order to classify a test point $x_j$, you need to compute

$$p( \mathrm{class}=k | x_j, \mathrm{data} ) \propto p( x_j | \mathrm{class}=k, \mathrm{data} ) p(\mathrm{class}=k | \mathrm{data} ) $$

where

$$p( x_j | \mathrm{class}=k, \mathrm{data} )$$

is given by a kernel density estimator derived from all data of class $k$.

The data that you will analyzing is the famous MNIST handwritten digits dataset. You can download some pre-processed MATLAB data files below:

MNIST training data vectors and labels

MNIST test data vectors and labels

These can be loaded using the scipy.io.loadmat function, as follows:

import scipy.io
 
train_mat = scipy.io.loadmat('mnist_train.mat')
train_data = train_mat['images']
train_labels = train_mat['labels']
 
test_mat = scipy.io.loadmat('mnist_test.mat')
test_data = test_mat['t10k_images']
test_labels = test_mat['t10k_labels']

The training data vectors are now in train_data, a numpy array of size 784×60000, with corresponding labels in train_labels, a numpy array of size 60000×1.


Hints:

Here is a simple way to visualize a digit. Suppose our digit is in variable X, which has dimensions 784×1:

import matplotlib.pyplot as plt
plt.imshow( X.reshape(28,28).T, interpolation='nearest', cmap=matplotlib.cm.gray)

Here are some functions that may be helpful to you:

import matplotlib.pyplot as plt
plt.subplot
 
numpy.argmax
 
numpy.exp
 
numpy.mean
 
numpy.bincount
cs501r_f2016/lab2.1472667824.txt.gz · Last modified: 2021/06/30 23:40 (external edit)