Offline
Throughout the years, we have been tracking the progress of cascaded solutions and end-to-end approaches on a variety of settings, including diverse languages, domains, speaking styles and recording conditions. We would continue this tradition and challenge the communities on their SLT solutions, including those using LLMs, with our evaluation framework.
The SLT system’s performance will be evaluated with respect to its capability to produce translations similar to the target-language references. Such similarity will be measured in terms of multiple automatic metrics: COMET, BLEURT, BLEU, TER, and characTER. The submitted runs will be ranked based on the COMET calculated on the test set by using automatic resegmentation of the hypothesis based on the reference translation by mwerSegmenter. The detailed evaluation script can be found in the SLT.KIT. Moreover, a human evaluation will be performed on each participant’s best-performing submission.