NYA_IWSLT25_Offline_en-ar_unconstrained_primary

Our system is a cascade speech translation model, which means it consists of two separate but connected components. The first component is the automatic speech recognition (ASR) part, where we utilize the Whisper model. The second stage is the machine translation (MT) module, which is powered by our in-house developed model. This model is based on the Transformer architecture. Specifically, our model comprises 18 encoder layers and 6 decoder layers. 
Regarding the training data, in addition to the constrained part of the data, we have also incorporated data obtained from web crawling. Furthermore, our internal data is integrated into the training process as well.
A key characteristic of our approach is the hypothesis selection mechanism. Instead of generating a single translation hypothesis, our system outputs multiple candidates. This approach allows us to explore a variety of potential translations. Subsequently, we employ the COMET-mbr method to evaluate and select the best hypothesis.
From\To de zh
en