Inspur Information Releases Source 2.0-M32 Large Model with 4-bit and 8-bit Quantization Versions

By:Maxwell Published 2024-08-23T03:45:51Z

TapTechNews August 23rd news, Inspur Information today released the Source 2.0-M32 large model 4-bit and 8-bit quantization versions, and its performance is claimed to be comparable to the 70-billion-parameter LLaMA3 open-source large model.

The 4-bit quantization version requires only 23.27 gigabytes of video memory for inference operation, and the computing power required per token is about 1.9 GFLOPs, and the computing power consumption is only 1/80 of the equivalent large model LLaMA3-70B. While the LLaMA3-70B requires 160 gigabytes of video memory and 140 GFLOPs of computing power.

According to Inspur Information, the Source 2.0-M32 quantization version is a version launched by the Source large model team to further improve the computational efficiency of the model and reduce the computing resource requirements for the deployment and operation of large models. The original model accuracy is quantized to int4 and int8 levels while maintaining basically the same model performance.

The Source 2.0-M32 large model is the latest version of Inspur Information's Source 2.0 series of large models, building a hybrid expert model (MoE) that contains 32 experts (Expert), and the activated parameters during model operation are 3.7 billion.

Evaluation results show that the Source 2.0-M32 quantization version outperforms the 70-billion-parameter LLaMA3 large model in tasks such as MATH (mathematics competition) and ARC-C (scientific reasoning).

Inspur Information Releases Source 2.0-M32 Large Model with 4-bit and 8-bit Quantization Versions_0

The Source 2.0-M32 quantization version has been open-sourced, and TapTechNews attaches the download links as follows:

Download links on the HuggingFace platform

https://huggingface.co/IEITYuan/Yuan2-M32-gguf-int4

https://huggingface.co/IEITYuan/Yuan2-M32-hf-int4

https://huggingface.co/IEITYuan/Yuan2-M32-hf-int8

Download links on the modelscope platform

https://modelscope.cn/models/IEITYuan/Yuan2-M32-gguf-int4

https://modelscope.cn/models/IEITYuan/Yuan2-M32-HF-INT4

https://modelscope.cn/models/IEITYuan/Yuan2-M32-hf-int8

Inspur Information Source 2.0 M32 quantization Llama3