Google Releases Gemma2 Large Language Model with Enhanced Performance

By:Maxwell Published 2024-06-28T00:25:34Z

TapTechNews June 28th news, Google yesterday released a press release announcing the global release of Gemma2 large language model to researchers and developers, with two sizes of 9 billion parameters (9B) and 27 billion parameters (27B).

Google Releases Gemma2 Large Language Model with Enhanced Performance_0

Compared to the first generation, Gemma2 large language model has higher inference performance, higher efficiency, and has made significant progress in security.

Google stated in the press release that the performance of the Gemma2-27B model is comparable to that of mainstream models of twice the size, and this performance can be achieved with only one Nvidia H100ensorCore GPU or TPU host, thus greatly reducing the deployment cost.

The Gemma2-9B model is superior to Llama3 8B and other similar-sized open-source models. Google also plans to release a Gemma2 model with 2.6 billion parameters in the next few months, which is more suitable for artificial intelligence application scenarios in smartphones.

Google said that it has redesigned the overall architecture for Gemma2 to achieve excellent performance and inference efficiency. TapTechNews attaches the main features of Gemma2 as follows:

Outstanding performance:

The 27B version has the best performance at the same size level, and is even more competitive than models twice its size. The performance of the 9B version is also leading among similar products, surpassing Llama3 8B and other open models of the same size.

Google Releases Gemma2 Large Language Model with Enhanced Performance_1

Efficiency and cost:

The 27B Gemma2 model can efficiently run inference at full precision on a single Google Cloud TPU host, Nvidia A100 80GB TensorCore GPU or Nvidia H100 TensorCore GPU, greatly reducing the cost while maintaining high performance. This makes the artificial intelligence deployment easier to achieve and the budget more reasonable.

Fast inference across hardware

Gemma2 has been optimized to run at amazing speeds on various hardware (from powerful gaming laptops and high-end desktops to cloud-based setups).

Try Gemma2 in full precision in Google AI Studio, unlock local performance with the quantized version of Gemma.cpp on CPUs, or try it on home PCs with NVIDIA RTX or GeForce RTX through HuggingFaceTransformers.

Google Gemma2 language model performance