In my current project, I am working on training encoder-decoder models (BART, T5, etc.) and the Transformers library has been absolutely invaluable! After seeing several Bertology analyses (i.e. looking at the information the model’s attention mechanism learns to attend to), I would like to know if a similar analysis is possible with the BART and T5 models in the Hugging Face library. Any recommendations are certainly appreciated!
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| Optimal methods to monitor attention matrices when doing training/inference using BERT-type models | 2 | 739 | September 11, 2021 | |
| Understanding what went wrong in attention | 5 | 1706 | July 31, 2020 | |
| In Donut Where the output of swin diffused with the text->1.At the starting of Bart encoder,2. cross attention(K,V from swin,Q from attention) of second attention of Bart encoder,3.directly the decoder part of BART | 0 | 179 | August 2, 2023 | |
| Using BERT embeddings as input for transformer architecture | 0 | 734 | June 23, 2022 | |
| Probing fine-tuned model | 1 | 943 | December 3, 2020 |