User Tools

Site Tools


cs401r_w2016:lab2

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

cs401r_w2016:lab2 [2016/01/12 15:48]
admin
cs401r_w2016:lab2 [2021/06/30 23:42]
Line 1: Line 1:
  
-====Objective:​==== 
- 
-To understand the relationship between a prior, a likelihood, a posterior and the posterior predictive distribution. ​ To understand that distributions can be placed over arbitrary objects, including things like abstract sequences of numbers. 
- 
----- 
-====Deliverable:​==== 
- 
-For this lab, you will turn in an ipython notebook that implements the "​Bayesian Concept Learning"​ model from Chapter 3 of MLAPP. 
- 
-Your notebook should perform the following functions: 
- 
-  - Prompt the user for a set of numbers. ​ (What happens if they only enter one number?) 
-  - Display the prior, likelihood, and posterior for each concept 
-  - Print the most likely concept 
-  - Print the posterior predictive distribution over numbers 
- 
-When you display your prior, likelihood, and posterior, your figure should look something like the ones in the book; my version is shown here: 
- 
-{{:​cs401r_w2016:​lab2_bayesian_concepts.png?​direct&​800|}} 
- 
-Similarly, when you display the posterior predictive, your figure should look something like this: 
- 
-{{:​cs401r_w2016:​lab2_pp.png?​direct&​800|}} 
- 
----- 
-====Description:​==== 
- 
-Following the Bayesian Concept Learning example in Chapter 3 of MLAPP, we're interested in reasoning about the origin of a set of numbers. ​ We'll do this by placing a prior over a set of possible //​concepts//​ (or "​candidate origins"​),​ and then use Bayes' law to construct a posterior distribution over concepts given some data. 
- 
-For this lab, we will only consider numbers between 0 and 100. 
- 
-<​del>​Your notebook should construct a set of possible number-game concepts (such as "​even"​ or "​odd"​). ​ These can be any set of concepts you want, but should include at least all of the concepts in the book (see, for example, Fig. 3.2).  You must assign a prior probability to each concept; the prior can be anything you want. </​del>​ 
- 
-To make grading easier on our incredible TA, your notebook should construct a set of possible number-game concepts that are the same as the concepts in the book (see Fig. 3.2).  You must assign a prior probability to each concept; to make grading easier, your prior should be: 
-  
- 
-This prior distribution is  
- 
-$$p(h)$$ 
- 
-You must then prompt the user for some data.  This will just be a sequence of numbers, like ''​16'',​ ''​2,​4,​6''​ or ''​4,​9,​25''​. ​ This is $\mathrm{data}$. ​ You must then compute the likelihood of the $\mathrm{data}$,​ given the hypothesis: 
- 
-$$p(\mathrm{data} | h )$$ 
- 
-**Important:​** you can assume that each number in the data was sampled independently,​ and that each number was sampled uniformly from the set of all possible numbers //in that concept//. 
- 
-//Hint: what does that imply about the probability of sampling a given number from a concept with lots of possibilities,​ such as the ''​all''​ concept, vs. a concept with few possibilities,​ such as ''​multiples of 10''?//​ 
- 
-Prepare a figure, as described in the Deliverable that illustrates your prior, the likelihood of the data for each concept, the posterior. ​ **Note:** distributions should be properly normalized. 
- 
-You must also prepare a figure showing the //posterior predictive distribution//​. ​ This distribution describes the probability that a number $\tilde{x}$ is in the target concept (which we'll call $\mathrm{C}$),​ given the data.  (Note that we're drawing a subtle distinction between the true //​concept// ​ and a //​hypothesis//​). ​ The book is somewhat unclear on this, but to do this, we marginalize out the specific hypothesis: 
- 
-$$p(\tilde{x} \in C | \mathrm{data} ) = \sum_h p(\tilde{x} \in C , h | \mathrm{data} )$$ 
- 
-$$p(\tilde{x} \in C | \mathrm{data} ) = \sum_h p(\tilde{x} \in C | h) p( h | \mathrm{data} )$$ 
- 
- 
-We've already computed the posterior $p( h | \mathrm{data} )$, so we're only left with the term $p(\tilde{x} \in C | h)$.  For this, just use an //​indicator//​ function that returns 1 if $\tilde{x}$ is in $h$, and 0 otherwise. 
- 
-//Hint: just like any other distribution,​ the posterior predictive is normalized - but it is not normalized as a function of $\tilde{x}$. ​ So what is it normalized over?// 
- 
----- 
-====Hints:​==== 
- 
-When using an ipython notebook, it's nice to make your plots show up inline. ​ To do this, add the following lines to the first cell of your notebook: 
- 
-<code python> 
- 
-# this tells seaborn and matplotlib to generate plots inline in the notebook 
-%matplotlib inline  ​ 
- 
-# these two lines allow you to control the figure size 
-%pylab inline 
-pylab.rcParams['​figure.figsize'​] = (16.0, 8.0) 
- 
-</​code>​ 
- 
- 
-You may find the following functions useful: 
- 
-<code python> 
-input('​Please enter a set of numbers: ') 
- 
-len 
- 
-range 
- 
-filter 
- 
-map 
- 
-all 
- 
-import matplotlib.pyplot as plt 
-import seaborn 
-plt.figure( 42 ) 
-plt.clf() 
-plt.subplot 
- 
-plt.barh 
-plt.title 
-plt.xlabel 
- 
-</​code>​ 
cs401r_w2016/lab2.txt ยท Last modified: 2021/06/30 23:42 (external edit)