Multi-head Attention Block
Attention Block
Each of are linear projection of token embeddings.
Multihead Attention
Now the same Attention block is applied mulitple time in parallel, then the result is concatenated.
Last updated