Gain exposure to state of the art attention mechanisms.
For this lab, you will need to read, modify, and extend the Transformer model from this paper. You will use this new network build a Spanish to English translation system.
This lab is again different than those seen previously. You will be working with an annotated jupyter notebook from Harvard. This annotated notebook contains the paper text, along with code snippets to explain certain aspects of this network.
You should turn in your completed jupyter notebook (built upon the annotated version from Harvard), which includes several example snippets of translated Spanish to English sentences.
Please turn in your samples inside of the jupyter notebook, not in a separate file.
Since much of this lab is reading someone else's code, the grading standard is slightly different.
Your code/image will be graded on the following:
For this lab, you will modify the Annotated Transformer.
There is a link to a coloab notebook in the jupyter notebook that you can use.
The code is slightly different between the notebook linked above, and the colab link provided by Harvard. Both will work, you may very likely need to mix and match pieces from each to get a working implementation. While this may feel slightly frustrating, it is good practice for deep learning research strategies.
If you are experiencing difficulties, you may need to install specific versions of the necessary packages. One student contributed this:
!pip install http://download.pytorch.org/whl/cu80/torch-0.3.0.post4-cp36-cp36m-linux_x86_64.whl numpy matplotlib spacy torchtext==0.2.3 seaborn
Often when implementing a novel deep learning method, you will start by using someone's implementation as a reference. This is an extremely valuable, and potentially time-saving skill, for producing workable solutions to many problems solved by deep learning methods.
There are 3 main parts to this lab
Part 1: Reading and Study
A large portion of your time spent in this lab will be reading and understanding the topology, attention, dot products, etc introduced in this paper. Since this will be time-consuming, and potentially new to some of you, there is no report or grading scheme. Simply do your best to read and understand the material (it will make part 2 and 3 easier if you have a good understanding).
Part 2: Extend their work to build a General Conference machine translation system
Included in the zipped files are two text files and one CSV. The csv contains each text file in a column (es, en). This is simply for convenience if you prefer working with csv files. Note: we make no guarantee of data quality or cleanliness, please ensure that the data you are working with is properly cleaned and formatted.
We will be translating from Spanish to English, which means Spanish is our source language and English is the target language.
The annotated jupyter notebook has an example application translating from Dutch to English. This may be useful to copy, however, you will need to do some work to get our data into a useable form to fit into the model. This can be tricky, so make sure you understand the various torch functions being called.
You may find the torchtext and data modules useful, but feel free to load the data as you see fit.
Train the transformer network using their implementation and print out 5-10 translated sentences with your translation and the ground truth. Feel free to comment on why you think it did well or underperformed.
You should expect to see reasonable results after 2-4 hours of training in colab.
Here is a good tutorial on torchtext.
Part 3: EXTRA CREDIT: Experiment with a different number of stacked layers in the encoder and decoder
Now it's time to put on your scientist hats. Try stacking a different number of layers (e.g., 1, 2, 4) for the encoder and decoder. This will require you to understand their implementation and be able to work with it reliably.
Give a qualitative evaluation of the training and results (maybe a plot, or a paragraph, something to demonstrate your thought process and results). This should show us that you are thinking about why you got the results you did with a different number of stacked layers.
Since this lab requires a bit more training time, consider getting started earlier in the week to give yourself plenty of time.
Good luck!