update readme
Browse files
README.md
CHANGED
|
@@ -19,4 +19,17 @@ tags:
|
|
| 19 |
- encoder-decoder
|
| 20 |
---
|
| 21 |
|
| 22 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
- encoder-decoder
|
| 20 |
---
|
| 21 |
|
| 22 |
+
This app aims to help users better understand the behavior behind the attention layers in transformer models by visualizing the cross-attention and self-attention weights in an encoder-decoder model to see the alignment between and within the source and target tokens.
|
| 23 |
+
|
| 24 |
+
The app leverages the `Helsinki-NLP/opus-mt-en-zh` model to perform translation tasks from English to Chinese and by `output_attentions=True`, the attention weights are stored as follows:
|
| 25 |
+
|
| 26 |
+
Attention Type | Shape | Role
|
| 27 |
+
encoder_attentions | (layers, B, heads, src_len, src_len) | Encoder self-attention on source tokens
|
| 28 |
+
decoder_attentions | (layers, B, heads, tgt_len, tgt_len) | Decoder self-attention on generated tokens
|
| 29 |
+
cross_attentions | (layers, B, heads, tgt_len, src_len) | Decoder attention over source tokens (encoder outputs)
|
| 30 |
+
|
| 31 |
+
By taking the weights from the last encoder and decoder layers and calculating the mean over the 8 heads, the attention weights (avg over heads) are obtained to build attention visualizations
|
| 32 |
+
|
| 33 |
+
**Note :**
|
| 34 |
+
* `attn_weights = softmax(Q @ K.T / sqrt(d_k)) `
|
| 35 |
+
* `(layers, B, heads, src_len, src_len)` - e.g. `(6, 1, 8, 24, 18)`
|
app.py
CHANGED
|
@@ -323,6 +323,8 @@ function showCrossAttFun(attn_scores, decoder_attn, encoder_attn) {
|
|
| 323 |
with gr.Blocks(css=css) as demo:
|
| 324 |
gr.Markdown("""
|
| 325 |
## 🕸️ Visualize Attentions in Translated Text (English to Chinese)
|
|
|
|
|
|
|
| 326 |
After translating your English input to Chinese, you can check the cross attentions and self-attentions of the translation in the lower section of the page.
|
| 327 |
""")
|
| 328 |
|
|
|
|
| 323 |
with gr.Blocks(css=css) as demo:
|
| 324 |
gr.Markdown("""
|
| 325 |
## 🕸️ Visualize Attentions in Translated Text (English to Chinese)
|
| 326 |
+
This app aims to help users better understand the behavior behind the attention layers in transformer models by visualizing the cross-attention and self-attention weights in an encoder-decoder model to see the alignment between and within the source and target tokens.
|
| 327 |
+
|
| 328 |
After translating your English input to Chinese, you can check the cross attentions and self-attentions of the translation in the lower section of the page.
|
| 329 |
""")
|
| 330 |
|