Word Embedding

What are Embeddings

Embedding are basically vector (numerical) representations of words (tokens). You input a token (number representation of a word or sub-word) and it output a vector to represent that token. Embedding enables you to represent words into vectors that can be used in your model.

Embedding of words generally follows some structure, such as, embedding of similar words are close to each other in the vector space.

There are different methods to train embedding. Based on different losses and different tasks, embedding can be learnt differently.

Examples of embedding:

  • One-hot encoding - Basically each word will be a one hot vecor. Hence there to represents all the words in your corpus, the vector length will be equal to all the unique words in your corpus. Which is way too high of input dimension to process.

  • Word2Vec - Neural network based approach to learn a NN to output a vector of corresponding input.

  • BERT Embeddings

Embedding Layer

Let's say that the each token is represented as a dd dimensional vector.

Then we can imagine Embedding layer as a weight matrix or lookup table denoted by WW of dimensions T×d T \times d, where TT is the vocabulary size. So now whenever we have token tit_i, we can just lookup the Embedding table/ weight matrix at the row tit_i and that row is the dd dimensional vector embedding of the token.

This could also be looked as xTWx^TW where is xx one hot vector to represent tit_i i.e x[idx]==1 if idx==ti else 0x[idx] == 1 \space\text{if} \space idx==t_i \space else \space 0. This operation basically gives you the row tit_i from the

Tokenization to Embeddings pipeline

Last updated