Yuanxiang XVERSE Releases Largest MoE Open-Source Model in China

TapTechNews September 13th news, Yuanxiang XVERSE released the largest MoE open-source model in China XVERSE-MoE-A36B.

The total parameters of this model are 255B, and the activated parameters are 36B. The official claims that the effect can roughly achieve a cross-level performance leap that exceeds that of a 100B+ large model. At the same time, the training time is reduced by 30%, and the inference performance is increased by 100%, greatly reducing the cost per token.

MoE (Mixture of Experts) is a mixed expert model architecture that combines multiple expert models in subdivided fields into a super model, while expanding the model scale, maintaining the maximization of model performance, and even reducing the computational cost of training and inference. Google's Gemini-1.5, OpenAI's GPT-4, and Musk's xAI company's Grok and other large models all use MoE.

In multiple evaluations, Yuanxiang MoE exceeds many similar models, including the domestic 100-billion-MoE model Skywork-MoE, the traditional MoE overlord Mixtral-8x22B, and the 314-billion-parameter open-source model Grok-1-A86B, etc.

Yuanxiang XVERSE Releases Largest MoE Open-Source Model in China_0

TapTechNews attached relevant links:

HuggingFace: https://huggingface.co/xverse/XVERSE-MoE-A36B

Moda: https://modelscope.cn/models/xverse/XVERSE-MoE-A36B

Github: https://github.com/xverse-ai/XVERSE-MoE-A36B

Likes