Differences

This shows you the differences between two versions of the page.

--- cs501r_f2016:lab14 [2017/11/20 20:08]
jszendre [Notes:]
+++ cs501r_f2016:lab14 [2017/11/20 22:49]
jszendre [Deliverable:]
@@ Line 43: / Line 43: @@
 Some of the resources for this lab include [[https://arxiv.org/pdf/1409.3215.pdf|Sequence to Sequence Learning with Neural Networks]] and [[https://arxiv.org/pdf/1409.0473.pdf|D Bahdanau, 2015]]. The former will be of more use in implementing the lab. State of the art NMT systems use Badanau's attention mechanism, but context alone should be enough for our dataset.
-Seq2seq and encoder/decoder are nearly synonymous architectures and represent the first major breakthrough using RNNs to map between source and target sequences of differing lengths. The encoder will map input sequences to a fixed length context vector and the decoder will then map that to the output sequence. Standard softmax / cross entropy is used on the scores output by the decoder and compared against the reference sequence.
+Seq2seq and encoder/decoder are nearly synonymous architectures and represent the first major breakthrough using RNNs to map between source and target sequences of differing lengths. The encoder will map input sequences to a fixed length context vector and the decoder will then map that to the output sequence. Loss is standard cross entropy between the scores output by the decoder and compared against the reference sentence.
 The hyperparameters used are given below.
@@ Line 74: / Line 74: @@
 Create an Encoder class that encapsulates all of the graph operations necessary for embedding and returns the context vector. Initialize both nn.GRUCell and nn.Embedding class members to embed the indexed source input sequence.
-For each time step use the last previous hidden state and the embedding for the current word as the initial hidden state and input tensor for the GRU. For the first time step use a tensor of zeros for the initial hidden state. Return the last layer's hidden state at the last time step.
+Implement a GRU using GRUCell using the embedding of the source sentence as the input at each time step. Use a zero-tensor as the initial hidden state. Return the last hidden state.
+You will probably want to use several layers for your GRU.
 <code python>
@@ Line 81: / Line 83: @@
         super(Encoder, self).__init__()
         # Instantiate nn.Embedding and nn.GRUCell
     def run_timestep(self, input, hidden):
         # implement gru here for the nth timestep

BYU CS classes

User Tools

Site Tools

Differences

Page Tools