Moore Threads and WISECORE Complete 3B-Scale Model Training

By:Jack Published 2024-05-27T03:13:30Z

TapTechNews May 27th news. Today, Moore Threads and W WISECORE jointly announced that they have officially completed the training of the 3B-scale large model "MT-infini-3B" based on the domestic all-functional GPU kilocalorie cluster. This model is built on the kilocalorie cluster composed of Moore Threads' domestic all-functional GPU MTT S4000 and WISECORE's AIStudio PaaS platform.

It is understood that the training of the MT-infini-3B model took a total of 13.2 days, and it was stable throughout without interruption. The stability of cluster training reached 100%, and the expansion efficiency of kilocalorie training compared to a single machine exceeded 90%. It is claimed that "it fully verified the reliability of the Kuowa kilocalorie intelligent computing cluster in the large model training scenario, and at the same time, it also pioneered a new paradigm of deep cooperation between the domestic large language model and the domestic GPU kilocalorie intelligent computing cluster in the industry".

Moore Threads and WISECORE Complete 3B-Scale Model Training_0

TapTechNews learned that the performance of the MT-infini-3B trained in the training ranks among the forefront in the same-scale models. Compared to other models trained on international mainstream hardware, it achieved performance leadership on the three test sets of C-Eval, MMLU, and CMMLU.

Xia Lixue, the co-founder and CEO of WISECORE, said that currently, WISECORE is building the "MxN" intermediate layer product between "M" models and "N" chips, to achieve efficient and unified deployment of multiple large model algorithms on diverse chips. It has reached a deep strategic cooperation with Moore Threads, and currently, this training result of "MT-infini-3B" is the first end-to-end large model training case based on domestic GPU chips from 0 to 1 in the industry.

Moore Threads WISECORE large model GPU cluster