This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
cs501r_f2016:lab14 [2017/11/20 20:06] jszendre [Deliverable:] |
cs501r_f2016:lab14 [2017/11/20 20:47] jszendre [Deliverable:] |
||
---|---|---|---|
Line 74: | Line 74: | ||
Create an Encoder class that encapsulates all of the graph operations necessary for embedding and returns the context vector. Initialize both nn.GRUCell and nn.Embedding class members to embed the indexed source input sequence. | Create an Encoder class that encapsulates all of the graph operations necessary for embedding and returns the context vector. Initialize both nn.GRUCell and nn.Embedding class members to embed the indexed source input sequence. | ||
- | For each time step use the last previous hidden state and the embedding for the current word as the initial hidden state and input tensor for the GRU. For the first time step use a tensor of zeros for the initial hidden state. Return the last layer's hidden state at the last time step. | + | Implement a GRU using GRUCell using the embedding of the source sentence as the input at each time step. Use a zero-tensor as the initial hidden state. Return the last hidden state. |
+ | |||
+ | You will probably want to use several layers for your GRU. | ||
<code python> | <code python> | ||
Line 81: | Line 83: | ||
super(Encoder, self).__init__() | super(Encoder, self).__init__() | ||
# Instantiate nn.Embedding and nn.GRUCell | # Instantiate nn.Embedding and nn.GRUCell | ||
- | | + | |
def run_timestep(self, input, hidden): | def run_timestep(self, input, hidden): | ||
# implement gru here for the nth timestep | # implement gru here for the nth timestep | ||
Line 155: | Line 158: | ||
Debugging in PyTorch is significantly more straightforward than in TensorFlow. Tensors are available at any time to print or log. | Debugging in PyTorch is significantly more straightforward than in TensorFlow. Tensors are available at any time to print or log. | ||
- | Better hyperparameters to come. Started to converge after two hours on a K80. | + | Better hyperparameters to come. Started to converge after two hours on a K80 using Adam. |
<code python> | <code python> | ||
- | learning_rate = .01 # decayed, lowest .0001 | + | learning_rate = .01 # decayed |
batch_size = 40 # effective batch size | batch_size = 40 # effective batch size | ||
- | max_seq_length = 40 # ambitious | + | max_seq_length = 30 |
- | hidden_dim = 1024 # can use larger | + | hidden_dim = 1024 |
</code> | </code> | ||