Russian Tech Giant Yandex Launches YaFSDP, Outperforming Existing Tools

TapTechNews June 11th news, Russian tech giant Yandex has launched an open-source large language model training tool - YaFSDP, claiming that it can increase the speed by up to 26% compared with existing tools.

 Russian Tech Giant Yandex Launches YaFSDP, Outperforming Existing Tools_0

It is introduced that YaFSDP is superior to the traditional FSDP method in terms of training speed, and is especially suitable for large models. In pre-training LLM, YaFSDP has increased the speed by 20%, and performs better under high memory pressure conditions.

For example, YaFSDP can achieve a 21% efficiency improvement for Llama2 with 70 billion parameters and a 26% efficiency improvement for Llama3 with the same-level parameters. TapTechNews attaches the official data at a glance:

Model gpu-count seq-len num-ckpt-layers speedup Llama2 7B 64 2048 0 9.92% Llama2 7B 64 4096 0 3.43% Llama2 7B 64 8192 0 2.68% Llama2 7B 128 2048 0 9.57% Llama2 7B 128 4096 0 2.42% Llama2 7B 128 8192 0 2.32% Llama2 13B 128 2048 0 12.10% Llama2 13B 128 4096 0 3.49% Llama2 34B 128 2048 0 20.70% Llama2 34B 256 2048 0 21.99% Llama2 34B 256 4096 5 8.35% Llama2 70B 256 2048 10 21.48% Llama2 70B 256 4096 5 7.17% Llama3 8B 64 2048 0 11.91% Llama3 8B 64 4096 0 7.86% Llama3 70B 256 2048 20 26.60%

Yandex stated that by optimizing GPU usage, YaFSDP can save developers and companies a significant amount of money - potentially saving hundreds of thousands of dollars per month.

One of Yandex's senior developers and a member of the YaFSDP team, Mikhail Khruschev, also mentioned, At present, we are actively experimenting with various model architectures and parameter sizes to expand the versatility of YaFSDP.

Reference materials:

Likes