🧪

Knowledge Challenge

A friend thinks you can answer this question about AI Translation Quality

Your localization team is choosing between two MT vendors. Vendor A scores 42 BLEU and Vendor B scores 39 BLEU on your test set. Vendor B scores 0.82 COMET; Vendor A scores 0.75 COMET. Human LQA shows Vendor B has 30% fewer major errors per 1,000 words. Which vendor and why?