Moore Threads Open-sources Audio Understanding Model MooER

TapTechNews August 23rd news, Moore Threads has open-sourced the audio understanding large model - MooER (Moore Ear), which is the first large-scale open-source speech model in the industry trained and inferred based on the domestic full-featured GPU.

Based on the Moore Threads Kua E (KUAE) intelligent computing platform, the MooER large model completed the training of 5000 hours of audio data and pseudo-labels in 38 hours.

MooER not only supports speech recognition in Chinese and English, but also has the ability of speech translation from Chinese to English. In the Covost2 Chinese-English translation test set, MooER-5K achieved a BLEU score of 25.2, approaching industrial-level results.

The Moore Threads AI team open-sourced the inference code and the model trained with 5000 hours of data in this work, and plans to further open-source the training code and the model trained with 80,000 hours of data.

Moore Threads Open-sources Audio Understanding Model MooER_0

The model structure of MooER includes three parts: Encoder, Adapter and Decoder (Large Language Model, LLM), and the specific model parameter scale is as follows:

Moore Threads Open-sources Audio Understanding Model MooER_1

TapTechNews attached relevant links:

Github address: https://github.com/MooreThreads/MooER

Technical documentation: https://arxiv.org/pdf/2408.05101

Technical demonstration: https://mooer-speech.mthreads.com:10077/

Likes