User Tools

Site Tools


cs401r_w2016:lab9

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cs401r_w2016:lab9 [2016/03/17 21:31]
admin
cs401r_w2016:lab9 [2021/06/30 23:42] (current)
Line 29: Line 29:
  
 {{ :​cs401r_w2016:​lab8_pdtm.png?​direct&​500 |}} {{ :​cs401r_w2016:​lab8_pdtm.png?​direct&​500 |}}
 +
 +{{ :​cs401r_w2016:​gibbs_sampler_results.png?​direct&​500|}}
 +
 +Here, you can see how documents that are strongly correlated with Topic #3 appear every six months; these are the sustainings of church officers and statistical reports.
 +
 +Your notebook must also produce a plot of the log posterior of the data over time, as your sampler progresses. ​ You should produce a single plot comparing the regular Gibbs sampler and the collapsed Gibbs sampler.
 +
 +To the right is an example of my log pdfs.
  
 ---- ----
Line 37: Line 45:
   * 40% Correct implementation of Gibbs sampler   * 40% Correct implementation of Gibbs sampler
   * 40% Correct implementation of collapsed Gibbs sampler   * 40% Correct implementation of collapsed Gibbs sampler
-  * 20% Final plots are tidy and legible+  * 20% Final plots are tidy and legible ​(at least 2 plots: posterior over time for both samplers, and heat-map of distribution of topics over documents)
  
 ---- ----
Line 44: Line 52:
 For this lab, you will code two different inference algorithms on the Latent Dirichlet Allocation (LDA) model. ​ For this lab, you will code two different inference algorithms on the Latent Dirichlet Allocation (LDA) model. ​
  
-You will use [[http://hatch.cs.byu.edu/courses/stat_ml/​files.tar.gz|a dataset of general conference talks]].+You will use [[https://www.dropbox.com/s/yr3n9w61ifon04h/​files.tar.gz?dl=0|a dataset of general conference talks]].  Download and untar these files; there is helper code in the ''​Hints''​ section to help you process them.
  
-<code python>​ +**Part 1: Basic Gibbs Sampler**
-import numpy as np +
-def p( x, t=1.0 ): +
-    return np.exp( -10*t*((x-2)**2) ) + 0.3*np.exp( -0.5*10*t*((x+1)**2) ) +
-</​code>​+
  
-This distribution has two modes that are separated by a region ​of low probability.+The LDA model consists ​of several variables: the per-word topic assignments,​ the per-document topic mixtures, and the overall topic distributions. ​ Your Gibbs sampler should alternate between sampling each variable, conditioned on all of the others. ​ Equations for this were given in the lecture slides.
  
-**Part 1Metropolis Hastings**+//Hintit may be helpful to define a datastructure,​ perhaps called ''​civk'',​ that contains counts of how many words are assigned to which topics in which documents. ​ My ''​civk''​ datastructure was simply a 3 dimensional array.//
  
-For this part, you should ​implement ​the basic MH MCMC algorithm. ​ You should use a Gaussian proposal distribution with three different variances ''​(0.1,​ 1.0 and 10.0)''​. ​ Your sampler should start with an initial state of ''​0''​.+You should ​run 100 sweeps through all the variables.
  
-For each different proposal distribution, you should ​run your MCMC chain for 10,000 steps, and record ​the sequence of states. ​ Thenyou should produce a visualization ​of the distribution of states, and overlay a plot of the actual target distribution. ​ They may or may not match (see, for example, the first example plot in the Description section).+Note: this can be quite slow.  As a result, you are welcome to run your code off-line, and simply include ​the final images / visualizations in your notebook (along with your code, of course!).
  
-Furthermore,​ for each proposal distribution,​ you should run three independent chains (you can do these sequentially or in parallel, as you like). ​ You should display each of these three chains on a single plot with time on the x-axis and the state on the y-axis. ​ Ideally, you will see each of the three chains mixing between two modes; you may notice other features of the behavior of the samplers as well, which you should report in your writeup!+**Part 2: Collapsed Gibbs Sampler**
  
-**Part 2: Hamiltonian MCMC**+The collapsed Gibbs sampler integrates out the specific topic and per-document-topic-distribution variables. ​ The only variables left, therefore, are the per-word topic assignments. ​ A collapsed Gibbs sampler therefore must iterate over every word in every document, and resample a topic assignment.
  
-For this partyou will code the Hamiltonian MCMC algorithm, as discussed in class. ​ To do this, you will need to compute the gradient of the density function with respect to the state. ​ An easy easy way to do this is to use the [[https://​github.com/​HIPS/​autograd|autograd]] package: +Againcomputing ​and maintaining ​datastructure ​that tracks counts ​will be very helpful.  This will again be somewhat slow.
-<code python>​ +
-from autograd import grad +
-import autograd.numpy as np +
-grad_p = grad( p ) +
-</​code>​ +
-The function ''​grad_p''​ accepts the same parameters as ''​p'',​ but it returns the gradient, instead of the density. +
- +
-You should use the leapfrog method to integrate the dynamics. +
- +
-Remember that you will need to introduce as many momentum variables as there are state (ie, position) variables. +
- +
-A detailed explanation of Hamiltonian MCMC can be found here:​[[http://​www.mcmchandbook.net/​HandbookChapter5.pdf|Hamiltonian MCMC]]. +
- +
-  * You will find the equations describing the leapfrog method in Equations 5.18, 5.19 and 5.20. +
-  * You will find description of how to convert a given ''​p(x)''​ into a Hamiltonian in Section 5.3.1. +
-  * You will find a description of the complete HMC algorithm in section 5.3.2.1 +
- +
-Remember ​that you will alternate between two steps: +
- +
-  - Sampling the momentum conditioned on the position.  This is just sampling from a Gaussian. +
-  - Proposing a new state for the position, given the momentum. ​ This involves integrating the dynamics, and then accepting or rejecting based on integration error. +
- +
-You will have to tune two parameters in order to implement HMC: the variance of the momentum variables, and the timestep used for integrating the dynamics ​Experiment with both, and report your results using plots like those you prepared for Part 1. +
- +
- +
-**Part 3: Observations** +
- +
-You have now coded two different inference algorithms, and a few variants of each.  For this section, you must provide a small write-up that compares and contrasts each.  Answer at least the following questions:​ +
- +
-  - What was the acceptance rate of each algorithm? ​ (ie, what percentage of proposals were accepted) +
-  - Why don't some inference algorithms explore both modes of the density? +
-  - Why do some algorithms stay in the same state repeatedly? ​ Is this good or bad? +
-  - What were the best values for the variance of the momentum variables and the timestep you found? ​ How did you know that they were good?+
  
 ---- ----
Line 153: Line 124:
  
 # topic distributions # topic distributions
-topics ​= np.zeros((V,​K))+bs = np.zeros((V,​K)) + (1/V)
 # how should this be initialized?​ # how should this be initialized?​
  
 # per-document-topic distributions # per-document-topic distributions
-pdtm = np.zeros((K,​D)) ​ +pis = np.zeros((K,​D)) ​+ (1/K)
 # how should this be initialized?​ # how should this be initialized?​
  
 for iters in range(0,​100):​ for iters in range(0,​100):​
-    p = compute_data_likelihood( docs_i, qs, topicspdtm +    p = compute_data_likelihood( docs_i, qs, bspis
-    print "Iter %d, p=%.2f"​ % (iters,p)+    print("Iter %d, p=%.2f"​ % (iters,p))
  
-    # resample per-word topic assignments ​qs+    # resample per-word topic assignments ​bs
  
-    # resample per-document topic mixtures ​pdtm+    # resample per-document topic mixtures ​pis
  
     # resample topics     # resample topics
  
  
-<​code>​+</code>
cs401r_w2016/lab9.1458250302.txt.gz · Last modified: 2021/06/30 23:40 (external edit)