Model Architecture The Transformer follows an encoder-decoder structure as shown in the left and right halves of Figure 1, respectively. We describe the multi-head attention in Section 3.2. The results are shown in Table 1.