Google's New SelectiveAttention Method Improves Transformer Model Performance

By:Maxwell Published 2024-10-09T02:49:34Z

TapTechNews October 9th news, the tech media marktechpost released a blog post yesterday (October 8th), reporting that Google has launched the SelectiveAttention method, which can improve the performance of the Transformer architecture model.

Introduction to the Transformer Architecture

The Transformer is a revolutionary neural network architecture proposed by Google in 2017, mainly used to process sequential data, especially in the field of natural language processing (NLP).

The core of the Transformer is the self-attention mechanism, which allows the model to capture the relationship between words when processing the input sequence, enabling the model to focus on all parts of the input sequence, not just local information.

The Transformer consists of multiple encoders and decoders. The encoder is responsible for understanding the input data, while the decoder generates the output. The multi-head self-attention mechanism enables the model to process information in parallel, improving efficiency and accuracy.

Challenges of the Transformer Architecture Model

One of the major challenges of the Transformer architecture is that they are inefficient when processing long text sequences. Due to the quadratic complexity caused by each token interacting with every other token in the sequence, this leads to an exponential increase in computing and memory requirements as the context length increases.

Current solutions to this problem include sparse attention mechanisms (sparse attention mechanisms), which limit the number of interactions between tokens, and context compression techniques that reduce the sequence length by summarizing past information.

However, this method is achieved by reducing the number of tokens considered in the attention mechanism, so it usually comes at the expense of performance and may result in the loss of key context information.

Google's New Method

Researchers at Google Research have proposed a new method called SelectiveAttention, which can dynamically ignore no longer relevant tokens, thereby improving the efficiency of the Transformer model.

SelectiveAttention uses a soft mask matrix to determine the importance of each token to future tokens, reducing attention to unimportant tokens.

Research has shown that the Transformer architecture model equipped with SelectiveAttention performs excellently in multiple natural language processing tasks, while significantly reducing memory usage and computing costs.

Googles New SelectiveAttention Method Improves Transformer Model Performance_0

Googles New SelectiveAttention Method Improves Transformer Model Performance_1

For example, in a Transformer model with 100 million parameters, the memory requirement of the attention module is reduced to 1/16, 1/25, and 1/47 when the context size is 512, 1024, and 2048 tokens, respectively. The proposed method also outperforms the traditional Transformer in the HellaSwag benchmark test, achieving up to a 5% accuracy improvement for larger model sizes.

SelectiveAttention allows for the construction of smaller and more efficient models, significantly reducing memory requirements without sacrificing accuracy.

TapTechNews attaches the reference address

Google SelectiveAttention Transformer NLP