Differences

This shows you the differences between two versions of the page.

--- cs501r_f2018:lab7 [2018/10/15 15:52]
carr
+++ cs501r_f2018:lab7 [2021/06/30 23:42] (current)
@@ Line 23: / Line 23: @@
   * 50% Read and Study the annotation found in the Harvard notebook (on your honor)
-  * 20% Clean, transform, load, and train on provided General Conference NMT dataset
+  * 40% Clean, transform, load, and train on provided General Conference NMT dataset
-  * 20% Try 1, 2, 4, 6 layers for both encoder and decoder pieces, report results in a few short paragraphs
   * 10% Good coding style, readable output
+  * 20% EXTRA CREDIT Try 1, 2, 4, 6 layers for both encoder and decoder pieces, report results in a few short paragraphs
 ----
@@ Line 31: / Line 31: @@
 For this lab, you will modify the
-[[https://github.com/harvardnlp/annotated-transformer/blob/master/The%20Annotated%20Transformer.ipynb|Annotated Transformer]]. There is link to a coloab notebook in the jupyter notebook that you can use. The code is slightly different between the notebook linked above, and the colab like provided by Harvard. Both will work, you may very likely need to mix and match pieces from each to get a working implementation. While this may feel slightly frustrating, it is good practice for deep learning research strategies.
+[[https://github.com/harvardnlp/annotated-transformer/blob/master/The%20Annotated%20Transformer.ipynb|Annotated Transformer]].
-Often when implementing a novel deep learning method, you will start by using someone's implementation as a reference. This is an extremely valuable, and potentially time-saving skill, for producing workable solutions to many problems solved by deep learning methods.
+**There is a link to a coloab notebook in the jupyter notebook that you can use.**
-There are 3 main parts to this lab
-----
-**Part 1: Reading and Study**
-A large portion of your time spent in this lab will be reading and understanding the topology, attention, dot products, etc introduced in this paper. Since this will be time consuming, and potentially new to some of you, there is no report or grading scheme. Simply do your best to read and understand the material (it will make part 2 and 3 easier if you have a good understanding).
-----
+The code is slightly different between the notebook linked above, and the colab link provided by Harvard. Both will work, you may very likely need to mix and match pieces from each to get a working implementation. While this may feel slightly frustrating, it is good practice for deep learning research strategies.
-**Part 2: Extend their work to build a General Conference machine translation system **
+If you are experiencing difficulties, you may need to install specific versions of the necessary packages.  One student contributed this:
-----
-**Part 2: Sample text and Training information**
-We now want to be able to train our network, and sample text after training.
-This function outlines how training a sequence style network goes. Fill in the pieces.
 <code python>
-def train(inp, target):
+!pip install http://download.pytorch.org/whl/cu80/torch-0.3.0.post4-cp36-cp36m-linux_x86_64.whl numpy matplotlib spacy torchtext==0.2.3 seaborn
-    ## initialize hidden layers, set up gradient and loss
-      # your code here
-    ## /
-    loss = 0
-    for c in range(chunk_len):
-        output, hidden = # run the forward pass of your rnn with proper input
-        loss += criterion(output, target[c].unsqueeze(0))
-    ## calculate backwards loss and step the optimizer (globally)
-      # your code here
-    ## /
-    return loss.item() / chunk_len
 </code>
-You can at this time, if you choose, also write out your train loop boilerplate that samples random sequences and trains your RNN. This will be helpful to have working before writing your own GRU class.
-If you are finished training, or during training, and you want to sample from the network you may consider using the following function. If your RNN model is instantiated as `decoder`then this will probabilistically sample a sequence of length `predict_len`
+Often when implementing a novel deep learning method, you will start by using someone's implementation as a reference. This is an extremely valuable, and potentially time-saving skill, for producing workable solutions to many problems solved by deep learning methods.
-<code python>
+There are 3 main parts to this lab
-def evaluate(prime_str='A', predict_len=100, temperature=0.8):
-    ## initialize hidden variable, initialize other useful variables
-      # your code here
-    ## /
-    prime_input = char_tensor(prime_str)
-    # Use priming string to "build up" hidden state
-    for p in range(len(prime_str) - 1):
-        _, hidden = decoder(prime_input[p], hidden)
-    inp = prime_input[-1]
-    for p in range(predict_len):
-        output, hidden = #run your RNN/decoder forward on the input
-        # Sample from the network as a multinomial distribution
-        output_dist = output.data.view(-1).div(temperature).exp()
-        top_i = torch.multinomial(output_dist, 1)[0]
-        ## get character from your list of all characters, add it to your output str sequence, set input
-        ## for the next pass through the model
-         # your code here
-        ## /
-    return predicted
-</code>
 ----
-**Part 3: Creating your own GRU cell**
+**Part 1: Reading and Study**
-The cell that you used in Part 1 was a pre-defined Pytorch layer. Now, write your own GRU class using the same parameters as the built-in Pytorch class does.
+A large portion of your time spent in this lab will be reading and understanding the topology, attention, dot products, etc introduced in this paper. Since this will be time-consuming, and potentially new to some of you, there is no report or grading scheme. Simply do your best to read and understand the material (it will make part 2 and 3 easier if you have a good understanding).
-**Please try not to look at the GRU cell definition.**
-The answer is right there in the code, and in theory, you could
-just cut-and-paste it.  This bit is on your honor!
 ----
-**Part 4: Run it and generate your final text!**
+**Part 2: Extend their work to build a General Conference machine translation system **
-Assuming everything has gone well, you should be able to run the main
+Included in the
-function in the scaffold code, using either your custom GRU cell or the built in layer, and see
+[[http://liftothers.org/dokuwiki/lib/exe/fetch.php?media=cs501r_f2018:es-en-general-conference.tar.gz|zipped files]] are two text files and one CSV. The csv contains each text file in a column (es, en). This is simply for convenience if you prefer working with csv files. Note: we make no guarantee of data quality or cleanliness, please ensure that the data you are working with is properly cleaned and formatted.
-output something like this.  I trained on the "lotr.txt" dataset,
-using chunk_length=200, hidden_size=100 for 2000 epochs gave.
-<code>
+We will be translating from Spanish to English, which means Spanish is our source language and English is the target language.
-[0m 9s (100 5%) 2.2169]
-Whaiss Mainde
-'
+The annotated jupyter notebook has an example application translating from Dutch to English. This may be useful to copy, however, you will need to do some work to get our data into a useable form to fit into the model. This can be tricky, so make sure you understand the various torch functions being called.
-he and the
+You may find the torchtext and data modules useful, but feel free to load the data as you see fit.
+Train the transformer network using their implementation and print out 5-10 translated sentences with your translation and the ground truth. Feel free to comment on why you think it did well or underperformed.
+You should expect to see reasonable results after 2-4 hours of training in colab.
-'od and roulll and Are say the
+[[http://mlexplained.com/2018/02/08/a-comprehensive-tutorial-to-torchtext/|Here is a good tutorial on torchtext.]]
-rere.
-'Wor
-'Iow anond wes ou
-'Yi
-[0m 19s (200 10%) 2.0371]
+----
-Whimbe.
+**Part 3: EXTRA CREDIT: Experiment with a different number of stacked layers in the encoder and decoder**
-'Thhe
+Now it's time to put on your scientist hats. Try stacking a different number of layers (e.g., 1, 2, 4) for the encoder and decoder. This will require you to understand their implementation and be able to work with it reliably.
-on not of they was thou hit of
-sil ubat thith hy the seare
-as sower and of len beda
-[0m 29s (300 15%) 2.0051]
+Give a qualitative evaluation of the training and results (maybe a plot, or a paragraph, something to demonstrate your thought process and results). This should show us that you are thinking about why you got the results you did with a different number of stacked layers.
-Whis the cart. Whe courn!' 'Bu't of they aid dou giter of fintard of the not you ous,
-'Thas orntie it
-[0m 38s (400 20%) 1.8617]
+Since this lab requires a bit more training time, consider getting started earlier in the week to give yourself plenty of time.
-Wh win took be to the know the gost bing to kno wide dought, and he as of they thin.
-The Gonhis gura
+Good luck!
-[0m 48s (500 25%) 1.9821]
-When of they singly call the and thave thing
-they the nowly we'tly by ands, of less be grarmines of t
-[0m 58s (600 30%) 1.8170]
-Whinds to mass of I
-not ken we ting and dour
-and they.
-'Wat res swe Ring set shat scmaid. The
-ha
-[1m 7s (700 35%) 2.0367]
-Whad ded troud wanty agy. Ve tanle gour the gone veart on hear, as dent far of the Ridgees.'
-'The Ri
-[1m 17s (800 40%) 1.9458]
-Whis is brouch Heared this lack and was weself, for on't
-abothom my and go staid it
-they curse arsh
-[1m 27s (900 45%) 1.7522]
-Whout bear the
-Evening
-the pace spood, Arright the spaines beren the and Wish was was on the more yo
-[1m 37s (1000 50%) 1.6444]
-Whe Swarn. at colk. N(r)rce or they he
-wearing. And the on the he was are he said Pipin.
-'Yes and i
-[1m 47s (1100 55%) 1.8770]
-Whing at they and thins the Wil might
-happened you dlack rusting and thousting fy them, there lifted
-[1m 57s (1200 60%) 1.9401]
-Wh the said Frodo eary him that the herremans!
-'I the Lager into came and broveener he sanly
-for
-s
-[2m 7s (1300 65%) 1.8095]
-When lest
-- in sound fair, and
-the Did dark he in the gose cilling the stand I in the sight. Frodo y
-[2m 16s (1400 70%) 1.9229]
-Whing in a shade and Mowarse round and parse could pass not a have partainly. ' for as I come of I
-le
-[2m 26s (1500 75%) 1.8169]
-Whese one her of in a lief that,
-but. 'We repagessed,
-wandere in these fair of long one have here my
-[2m 36s (1600 80%) 1.6635]
-Where fread in thougraned in woohis, on the the green the
-pohered alked tore becaming was seen what c
-[2m 46s (1700 85%) 1.7868]
-Whil neat
-came to
-is laked,
-and fourst on him grey now they as pass away aren have in the border sw
-[2m 56s (1800 90%) 1.6343]
-Wh magered.
-Then tell some tame had bear that
-came as it nome in
-to houbbirnen and to heardy.
-'
-[3m 6s (1900 95%) 1.8191]
-Who expey to must away be to the master felkly and for, what shours was alons? I had be the long to fo
-[3m 16s (2000 100%) 1.8725]
-White, and his of his in before that for brown before can then took on the fainter smass about rifall
-</code>

BYU CS classes

User Tools

Site Tools

Differences

Page Tools