User Tools

Site Tools


cs501r_f2016:lab14

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
cs501r_f2016:lab14 [2017/11/17 19:40]
jszendre [PyTorch Overview:]
cs501r_f2016:lab14 [2017/11/20 20:06]
jszendre [Deliverable:]
Line 41: Line 41:
 In this lab you will use PyTorch to implement a vanilla autoencoder / decoder neural machine translation system from Spanish to English. In this lab you will use PyTorch to implement a vanilla autoencoder / decoder neural machine translation system from Spanish to English.
  
-Some of the resources for this lab include [[https://​arxiv.org/​pdf/​1409.3215.pdf|Sequence to Sequence Learning with Neural Networks]] and [[https://​arxiv.org/​pdf/​1409.0473.pdf|D Bahdanau, 2015]]. State of the art NMT systems use Badanau'​s attention mechanism, but context alone should be enough for our dataset.+Some of the resources for this lab include [[https://​arxiv.org/​pdf/​1409.3215.pdf|Sequence to Sequence Learning with Neural Networks]] and [[https://​arxiv.org/​pdf/​1409.0473.pdf|D Bahdanau, 2015]]. The former will be of more use in implementing the lab. State of the art NMT systems use Badanau'​s attention mechanism, but context alone should be enough for our dataset.
  
 Seq2seq and encoder/​decoder are nearly synonymous architectures and represent the first major breakthrough using RNNs to map between source and target sequences of differing lengths. The encoder will map input sequences to a fixed length context vector and the decoder will then map that to the output sequence. Standard softmax / cross entropy is used on the scores output by the decoder and compared against the reference sequence. Seq2seq and encoder/​decoder are nearly synonymous architectures and represent the first major breakthrough using RNNs to map between source and target sequences of differing lengths. The encoder will map input sequences to a fixed length context vector and the decoder will then map that to the output sequence. Standard softmax / cross entropy is used on the scores output by the decoder and compared against the reference sequence.
Line 92: Line 92:
 **Part 3:** Implementing the Decoder **Part 3:** Implementing the Decoder
  
-The decoder will be more involved, but will be similar to the encoderThis time there will be an additional intertemporal connection between each layer’s output ​and the subsequent first layer’s ​input between time steps.+Again implement a standard GRU using GRUCell with the exception that for the first timestep embed a tensor containing the SOS indexThat and the context vector will serve as the input and initial hidden state
  
-For the first timestep embed a tensor containing ​the SOS index. That and the context vector will serve as the input and initial hidden stateCall GRUCell n_layers times like before, but for proceeding time steps use the prediction of the previous time step as the initial ​input. Like the autoencoder the initial hidden state at each time step will be the last hidden state from the previous time step.+Unlike ​the encoder, for each time step take the output (GRUCell calls it h'​) ​and run it through a linear layer and then softmax to get probabilities over the english corpusUse the word with the highest probability ​as the input for the next timestep.
  
-Use linear layer and then softmax ​to convert the output at each time step to a tensor ​of probabilities over all words in your target corpus and use those probabilities to create ​the prediction ​for the next word.+You may want to consider using method called teacher forcing ​to begin connecting source/​reference words together. If you decide ​to use this, for set probability at each iteration input the embedding of the correct word it should translate instead ​of the prediction ​from the previous time step.
  
-Stop the first time that EOS is predicted. Return the probabilities at each time step and the indices ​of predicted words.+Compute and return ​the prediction probabilities in either case to be used by the loss function. 
 + 
 +Continue running the decoder GRU until the max sentence length or EOS is first predicted. Return the probabilities at each time step regardless ​of whether teacher forcing was used
  
 **Part 4:** Loss, test metrics **Part 4:** Loss, test metrics
Line 106: Line 108:
 Calculate accuracy by something similar to (target==reference).data.numpy(),​ but make sure to compensate for when the target and reference sequences are of different lengths. Calculate accuracy by something similar to (target==reference).data.numpy(),​ but make sure to compensate for when the target and reference sequences are of different lengths.
  
-Perplexity is a standard ​measure ​for NMT and Language Modelling, it is equivalent to 2^cross_entropy.+Consider using perplexity in addition to cross entropy as test metric. It'​s ​standard ​practice ​for NMT and Language Modelling ​and is 2^cross_entropy.
  
 **Part 5:** Optimizer **Part 5:** Optimizer
Line 115: Line 117:
  
 <code python> <code python>
-# compute loss 
- 
 loss.backward() loss.backward()
-if j % == 0:    ​+ 
 +if j % batch_size ​== 0:    ​
     for p in all_parameters:​     for p in all_parameters:​
         p.grad.div_(n) # in-place         p.grad.div_(n) # in-place
cs501r_f2016/lab14.txt · Last modified: 2021/06/30 23:42 (external edit)