Word Embedding
Last updated
Last updated
Embedding are basically vector (numerical) representations of words (tokens). You input a token (number representation of a word or sub-word) and it output a vector to represent that token. Embedding enables you to represent words into vectors that can be used in your model.
Embedding of words generally follows some structure, such as, embedding of similar words are close to each other in the vector space.
There are different methods to train embedding. Based on different losses and different tasks, embedding can be learnt differently.
One-hot encoding - Basically each word will be a one hot vecor. Hence there to represents all the words in your corpus, the vector length will be equal to all the unique words in your corpus. Which is way too high of input dimension to process.
Word2Vec - Neural network based approach to learn a NN to output a vector of corresponding input.
BERT Embeddings
Let's say that the each token is represented as a dimensional vector.
Then we can imagine Embedding layer as a weight matrix or lookup table denoted by of dimensions , where is the vocabulary size. So now whenever we have token , we can just lookup the Embedding table/ weight matrix at the row and that row is the dimensional vector embedding of the token.
This could also be looked as where is one hot vector to represent i.e . This operation basically gives you the row from the