BLEU-4 is an automatic metric for evaluating the quality of generated text, specifically focusing on the precision of 1-gram to 4-gram overlaps between a candidate text and one or more reference texts. It incorporates a brevity penalty to discourage overly short outputs, providing a single score that reflects both fluency and adequacy.
BLEU-4 is a standard automated metric used to assess the quality of generated text, such as machine translations or image captions. It works by comparing sequences of words (n-grams up to four words long) in the generated text against human-written reference texts, penalizing outputs that are too short.
BLEU, BLEU score, BLEU-N (general), N-gram overlap metric
Was this definition helpful?