This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
cs401r_w2016:lab9 [2018/03/21 16:19] sadler [Hints:] |
cs401r_w2016:lab9 [2018/03/27 22:38] wingated |
||
---|---|---|---|
Line 29: | Line 29: | ||
{{ :cs401r_w2016:lab8_pdtm.png?direct&500 |}} | {{ :cs401r_w2016:lab8_pdtm.png?direct&500 |}} | ||
+ | |||
+ | {{ :cs401r_w2016:gibbs_sampler_results.png?direct&500|}} | ||
Here, you can see how documents that are strongly correlated with Topic #3 appear every six months; these are the sustainings of church officers and statistical reports. | Here, you can see how documents that are strongly correlated with Topic #3 appear every six months; these are the sustainings of church officers and statistical reports. | ||
Your notebook must also produce a plot of the log posterior of the data over time, as your sampler progresses. You should produce a single plot comparing the regular Gibbs sampler and the collapsed Gibbs sampler. | Your notebook must also produce a plot of the log posterior of the data over time, as your sampler progresses. You should produce a single plot comparing the regular Gibbs sampler and the collapsed Gibbs sampler. | ||
+ | |||
+ | To the right is an example of my log pdfs. | ||
---- | ---- | ||
Line 41: | Line 45: | ||
* 40% Correct implementation of Gibbs sampler | * 40% Correct implementation of Gibbs sampler | ||
* 40% Correct implementation of collapsed Gibbs sampler | * 40% Correct implementation of collapsed Gibbs sampler | ||
- | * 20% Final plots are tidy and legible | + | * 20% Final plots are tidy and legible (at least 2 plots: posterior over time for both samplers, and heat-map of distribution of topics over documents) |
---- | ---- | ||
Line 120: | Line 124: | ||
# topic distributions | # topic distributions | ||
- | topics = np.zeros((V,K)) | + | bs = np.zeros((V,K)) + (1/V) |
# how should this be initialized? | # how should this be initialized? | ||
# per-document-topic distributions | # per-document-topic distributions | ||
- | pdtm = np.zeros((K,D)) | + | pis = np.zeros((K,D)) + (1/K) |
# how should this be initialized? | # how should this be initialized? | ||
for iters in range(0,100): | for iters in range(0,100): | ||
- | p = compute_data_likelihood( docs_i, qs, topics, pdtm ) | + | p = compute_data_likelihood( docs_i, qs, bs, pis) |
print("Iter %d, p=%.2f" % (iters,p)) | print("Iter %d, p=%.2f" % (iters,p)) | ||
- | # resample per-word topic assignments qs | + | # resample per-word topic assignments bs |
- | # resample per-document topic mixtures pdtm | + | # resample per-document topic mixtures pis |
# resample topics | # resample topics |