Metrics

BLUE

Used to compare generated translations,

Compared n-grams generated translation to the n-grams of ground truth reference translation. n-grams essentially means chunks of n words.

Average of precision of n-grams over different values of n

ROUGE

Used for text summarization

Compared n-grams of generated summary to the n-grams of ground truth reference summary.

calculate recall and precision of n-grams then calculate the F1 score.

Can do this for difference value of n.

Perplexity

Metric to evaluate language models.

Perplexity is an exponent of cross entropy loss of generated output.

Last updated