Zhipu AI's GLM-4-Flash Model Freely Available with Advanced Features

TapTechNews August 27th news, Zhipu AI announced today that the GLM-4-Flash large model is freely available, which can be invoked through the Zhipu AI large model open platform.

GLM-4-Flash is suitable for completing simple vertical, low-cost, and requires quick response tasks, and the generation speed can reach 72.14 tokens/s, which is approximately equal to 115 characters/s.

GLM-4-Flash has functions such as multi-round dialogue, web browsing, FunctionCall (function invocation) and long text reasoning (supporting up to 128K context), and at the same time supports 26 languages including Chinese, English, Japanese, Korean, German, etc..

The official said that by adopting various methods such as adaptive weight quantization, various parallelization methods, batch processing strategies, and speculative sampling, the model's latency reduction and speed improvement are achieved at the inference level. The greater concurrency and throughput not only improve the efficiency but also significantly reduce the inference cost, so it is launched for free.

In the pre-training aspect, the official introduced a large language model into the data screening process and obtained 10T high-quality multilingual data, which is more than 3 times the data volume of the ChatGLM3-6B model; at the same time, FP8 technology was used for pre-training to improve the training efficiency and computational amount.

TapTechNews attached relevant links are as follows:

Experience address: https://bigmodel.cn/console/trialcenter?modelCode=glm-4-flash

Specification document: https://open.bigmodel.cn/dev/api#glm-4

Likes