User Tools

Site Tools


cs401r_w2016:lab5

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

cs401r_w2016:lab5 [2015/12/29 00:07]
admin [Deliverable:]
cs401r_w2016:lab5 [2021/06/30 23:42]
Line 1: Line 1:
-====Objective:​==== 
- 
-To understand how to use kernel density estimation to both generate a simple classifier and a class-conditional visualization of different hand-written digits. 
- 
- 
-====Deliverable:​==== 
- 
-You should turn in an iPython notebook that performs three tasks. ​ All tasks will be done using the MNIST handwritten digit data set (see Description for details): 
- 
-{{ :​cs401r_w2016:​lab5_class_mean.png?​direct&​200|}} 
-  - Generate a visualization of the expected value of each class, where the density over classes is estimated using a kernel density estimator (KDE). ​ The data for these KDEs should come from the MNIST training data (see below). ​ Your notebook should generate 10 images, arranged neatly, one per digit class. ​ Each image might look something like the one on the right. 
-  - Build a simple classifier using **only** the class means, and test it using the MNIST test data. (Note: this couldn'​t possibly be a good classifier!) ​ That is, for each test data point $x_j$, you should compute the probability that $x_j$ came from a Gaussian centered at $\mu_k$, where $\mu_k$ is the expected value of class $k$ you computed in Part (1).  Classify $x_j$ as coming from the most likely class. 
-  - Build a more complex classifier using a full kernel density estimator. ​ For each test data point $x_j$, you should calculate the probability that it belongs to class $k$. 
- 
-For Part (2) and Part (3) your notebook should report two things: 
- 
-{{ :​cs401r_w2016:​lab5_confmat2.png?​direct&​300|}}  ​ 
- 
-  - The overall classification rate.  For example, when I coded up Part 2, my classification error rate was 17.97%. ​ When I coded up Part (3), my error rate was 3.80%. 
-  - A confusion matrix (see MLAPP pg. 183), or [[https://​en.wikipedia.org/​wiki/​Confusion_matrix|this wikipedia article]]. ​ A confusion matrix is a complete report of all of the different ways your classifier was wrong, and is much more informative than a single error rate; for example, a confusion matrix will report the number of times your classifier reported "​3",​ when the true class was "​8"​. ​ You can report this confusion matrix either as a text table, or as an image. ​ My confusion matrix is shown to the right; you can see that my classifier generally gets things right (the strong diagonal), but sometimes predicts "​9"​ when the true class is "​4"​ (for example). 
- 
- 
-//What errors do you think are most likely for this lab?// 
- 
-====Description:​==== 
- 
-For this lab, you will be experimenting with Kernel Density Estimators (see MLAPP 14.7.2). ​ These are a simple, nonparametric alternative to Gaussian mixture models, but which form an important part of the machine learning toolkit. 
- 
-At several points during this lab, you will need to construct density estimates that are "​class-conditional"​. ​ For example, in order to classify a test point $x_j$, you need to compute 
- 
-$p( \mathrm{class}=k | x_j, \mathrm{data} ) \propto p( x_j | \mathrm{class}=k,​ \mathrm{data} ) p(\mathrm{class}=k | \mathrm{data} ) $ 
- 
-where 
- 
-$p( x_j | \mathrm{class}=k,​ \mathrm{data} )$ 
- 
-is given by a kernel density estimator derived from all data of class $k$. 
- 
- 
-The data that you will analyzing is the famous [[http://​yann.lecun.com/​exdb/​mnist/​|MNIST handwritten digits dataset]]. ​ You can download some pre-processed MATLAB data files below: 
- 
-[[http://​hatch.cs.byu.edu/​courses/​stat_ml/​mnist_train.mat|MNIST training data vectors and labels]] 
- 
-[[http://​hatch.cs.byu.edu/​courses/​stat_ml/​mnist_test.mat|MNIST test data vectors and labels]] 
- 
-These can be loaded using the scipy.io.loadmat function, as follows: 
- 
-<code python> 
-import scipy.io 
- 
-train_mat = scipy.io.loadmat('​mnist_train.mat'​) 
-train_data = train_mat['​images'​] 
-train_labels = train_mat['​labels'​] 
- 
-test_mat = scipy.io.loadmat('​mnist_test.mat'​) 
-test_data = test_mat['​t10k_images'​] 
-test_labels = test_mat['​t10k_labels'​] 
-</​code>​ 
- 
-The training data vectors are now in ''​train_data'',​ a numpy array of size 784x60000, with corresponding labels in ''​train_labels'',​ a numpy array of size 60000x1. 
- 
-====Hints:​==== 
- 
-Here is a simple way to visualize a digit. ​ Suppose our digit is in variable ''​X'',​ which has dimensions 784x1: 
- 
-<code python> 
-import matplotlib.pyplot as plt 
-plt.imshow( 1-X.reshape(28,​28).T,​ interpolation='​nearest'​ ) 
-</​code>​ 
- 
-Here are some functions that may be helpful to you: 
- 
-<code python> 
- 
-numpy.argmax 
- 
-numpy.bincount 
- 
-</​code>​ 
  
cs401r_w2016/lab5.txt ยท Last modified: 2021/06/30 23:42 (external edit)