New Method for Running Large Language Model with Low Power

TapTechNews June 27 news, the research team of the University of California, Santa Cruz has developed a new method. Only 13 watts of power (equivalent to the power of a modern LED light bulb) is needed to run a large language model with a scale of 1 billion parameters. In contrast, a data center-class GPU used for large language model tasks requires about 700 watts.

New Method for Running Large Language Model with Low Power_0

Under the AI wave, the main research direction of many companies and institutions is in application and reasoning, and efficiency and other indicators are seldom considered. To alleviate this situation, the researcher eliminated the dense technology of matrix multiplication and proposed a ternary number scheme with only three values of minus one, zero, or positive one.

The team also created custom hardware using a highly customized circuit called a field-programmable gate array (FPGA), enabling them to maximize all the energy-saving functions in the neural network.

When running on custom hardware, it can achieve the same performance as top models such as Meta's Llama, but the power of the neural network is one-fiftieth of the regular configuration.

The neural network design can also be used to run on the standard GPUs commonly used in the artificial intelligence industry. The test results show that compared to the neural network based on matrix multiplication, the video memory occupancy is only one-tenth.

TapTechNews attaches a reference address

Likes