User Tools

Site Tools


cs401r_w2016:lab9

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
cs401r_w2016:lab9 [2016/03/17 20:57]
admin created
cs401r_w2016:lab9 [2021/06/30 23:42] (current)
Line 10: Line 10:
 Your notebook should implement two different inference algorithms on the LDA model: (1) a standard Gibbs sampler, and (2) a collapsed Gibbs sampler. Your notebook should implement two different inference algorithms on the LDA model: (1) a standard Gibbs sampler, and (2) a collapsed Gibbs sampler.
  
-Your notebook should produce a visualization of the topics it discovers, as well as how those topics are distributed across documents.+Your notebook should produce a visualization of the topics it discovers, as well as how those topics are distributed across documents. ​ You should highlight anything interesting you may find!  For example, when I ran LDA with 10 topics, I found the following topic 
  
----- 
-====Grading standards:​==== 
  
-Your notebook will be graded on the following elements:+  GS Topic 3: 
 +                  it    0.005515 
 +            manifest ​   0.005188 
 +               ​first ​   0.005022 
 +              please ​   0.004463 
 +          presidency ​   0.004170 
 +            proposed ​   0.003997 
 +                   ​m ​   0.003619 
 +               ​favor ​   0.003428 
 +             ​sustain ​   0.003165 
 +             ​opposed ​   0.003133
  
-  * 40% Correct implementation of Gibbs sampler 
-  * 40% Correct implementation of collapsed Gibbs sampler 
-  * 20% Final plots are tidy and legible 
  
----- +To see how documents were assigned to topics, I visualized the per-document topic mixtures as a single matrix, where color indicates probability:
-====Description:====+
  
-For this lab, you will code two different inference algorithms on the Latent Dirichlet Allocation (LDA) model+{{ :​cs401r_w2016:​lab8_pdtm.png?​direct&​500 |}}
  
-You will use [[http://hatch.cs.byu.edu/​courses/​stat_ml/​files.tar.gz|a dataset of general conference talks]].+{{ :cs401r_w2016:​gibbs_sampler_results.png?​direct&​500|}}
  
-<code python>​ +Hereyou can see how documents that are strongly correlated with Topic #appear every six months; these are the sustainings of church officers and statistical reports.
-import numpy as np +
-def p( xt=1.0 ): +
-    return np.exp( -10*t*((x-2)**2) ) + 0.3*np.exp( -0.5*10*t*((x+1)**2) ) +
-</​code>​+
  
-This distribution has two modes that are separated by region ​of low probability.+Your notebook must also produce ​plot of the log posterior of the data over time, as your sampler progresses. ​ You should produce a single plot comparing the regular Gibbs sampler and the collapsed Gibbs sampler.
  
-**Part 1: Metropolis Hastings**+To the right is an example of my log pdfs.
  
-For this part, you should implement the basic MH MCMC algorithm. ​ You should use a Gaussian proposal distribution with three different variances ''​(0.1,​ 1.0 and 10.0)''​. ​ Your sampler should start with an initial state of ''​0''​.+---- 
 +====Grading standards:​====
  
-For each different proposal distribution,​ you should run your MCMC chain for 10,000 steps, and record ​the sequence of states. ​ Then, you should produce a visualization of the distribution of states, and overlay a plot of the actual target distribution. ​ They may or may not match (see, for example, the first example plot in the Description section).+Your notebook will be graded on the following elements:
  
-Furthermore,​ for each proposal distribution,​ you should run three independent chains (you can do these sequentially or in parallel, as you like). ​ You should display each of these three chains on a single plot with time on the x-axis ​and the state on the y-axis.  Ideally, you will see each of the three chains mixing between two modes; you may notice other features ​of the behavior of the samplers as well, which you should report in your writeup!+  * 40% Correct implementation of Gibbs sampler 
 +  * 40% Correct implementation ​of collapsed Gibbs sampler 
 +  * 20% Final plots are tidy and legible (at least 2 plots: posterior over time for both samplers, ​and heat-map of distribution ​of topics over documents)
  
-**Part 2Hamiltonian MCMC**+---- 
 +====Description:====
  
-For this part, you will code the Hamiltonian MCMC algorithm, as discussed in class. ​ To do this, you will need to compute the gradient of the density function with respect to the state. ​ An easy easy way to do this is to use the [[https://​github.com/​HIPS/​autograd|autograd]] package: +For this lab, you will code two different inference algorithms on the Latent Dirichlet Allocation ​(LDAmodel
-<code python>​ +
-from autograd import grad +
-import autograd.numpy as np +
-grad_p = grad) +
-</​code>​ +
-The function ''​grad_p''​ accepts the same parameters as ''​p'',​ but it returns the gradient, instead of the density.+
  
-You should ​use the leapfrog method ​to integrate the dynamics.+You will use [[https://​www.dropbox.com/​s/​yr3n9w61ifon04h/​files.tar.gz?​dl=0|a dataset of general conference talks]]. ​ Download and untar these files; there is helper code in the ''​Hints''​ section ​to help you process them.
  
-Remember that you will need to introduce as many momentum variables as there are state (ie, position) variables.+**Part 1: Basic Gibbs Sampler**
  
-A detailed explanation ​of Hamiltonian MCMC can be found here:[[http://​www.mcmchandbook.net/​HandbookChapter5.pdf|Hamiltonian MCMC]].+The LDA model consists ​of several variablesthe per-word topic assignments,​ the per-document topic mixtures, and the overall topic distributions Your Gibbs sampler should alternate between sampling each variable, conditioned on all of the others ​Equations for this were given in the lecture slides.
  
-  * You will find the equations describing the leapfrog method in Equations 5.18, 5.19 and 5.20. +//Hint: it may be helpful ​to define ​datastructure,​ perhaps called ​''​civk''​, that contains counts of how many words are assigned to which topics ​in which documents.  ​My ''​civk''​ datastructure was simply ​a 3 dimensional array.//
-  * You will find a description of how to convert ​given ''​p(x)'' ​into a Hamiltonian ​in Section 5.3.1. +
-  * You will find description of the complete HMC algorithm in section 5.3.2.1+
  
-Remember that you will alternate between two steps:+You should run 100 sweeps through all the variables.
  
-  - Sampling the momentum conditioned on the position.  ​This is just sampling from Gaussian. +Note: this can be quite slow.  ​As result, you are welcome to run your code off-lineand simply include ​the final images / visualizations in your notebook (along with your codeof course!).
-  ​Proposing a new state for the positiongiven the momentum. ​ This involves integrating the dynamicsand then accepting or rejecting based on integration error.+
  
-You will have to tune two parameters in order to implement HMC: the variance of the momentum variables, and the timestep used for integrating the dynamics. ​ Experiment with both, and report your results using plots like those you prepared for Part 1.+**Part 2: Collapsed Gibbs Sampler**
  
 +The collapsed Gibbs sampler integrates out the specific topic and per-document-topic-distribution variables. ​ The only variables left, therefore, are the per-word topic assignments. ​ A collapsed Gibbs sampler therefore must iterate over every word in every document, and resample a topic assignment.
  
-**Part 3: Observations** +Againcomputing ​and maintaining ​datastructure that tracks counts will be very helpful.  ​This will again be somewhat slow.
- +
-You have now coded two different inference algorithms, and a few variants of each.  ​For this section, you must provide a small write-up that compares and contrasts each ​Answer at least the following questions:​ +
- +
-  - What was the acceptance rate of each algorithm? ​ (ie, what percentage of proposals were accepted) +
-  - Why don't some inference algorithms explore both modes of the density? +
-  - Why do some algorithms stay in the same state repeatedly? ​ Is this good or bad? +
-  - What were the best values for the variance of the momentum variables and the timestep you found? ​ How did you know that they were good?+
  
 ---- ----
Line 135: Line 124:
  
 # topic distributions # topic distributions
-topics ​= np.zeros((V,​K))+bs = np.zeros((V,​K)) + (1/V)
 # how should this be initialized?​ # how should this be initialized?​
  
 # per-document-topic distributions # per-document-topic distributions
-pdtm = np.zeros((K,​D)) ​ +pis = np.zeros((K,​D)) ​+ (1/K)
 # how should this be initialized?​ # how should this be initialized?​
  
 for iters in range(0,​100):​ for iters in range(0,​100):​
-    p = compute_data_likelihood( docs_i, qs, topicspdtm +    p = compute_data_likelihood( docs_i, qs, bspis
-    print "Iter %d, p=%.2f"​ % (iters,p)+    print("Iter %d, p=%.2f"​ % (iters,p))
  
-    # resample per-word topic assignments ​qs+    # resample per-word topic assignments ​bs
  
-    # resample per-document topic mixtures ​pdtm+    # resample per-document topic mixtures ​pis
  
     # resample topics     # resample topics
  
  
-<​code>​+</code>
cs401r_w2016/lab9.1458248237.txt.gz · Last modified: 2021/06/30 23:40 (external edit)