Researcher Introduces xLSTM Architecture to Challenge Transformer

On May 13, TapTechNews learned that researchers Sepp Hochreiter and Jürgen Schmidhuber jointly proposed the Long Short-Term Memory (LSTM) neural network structure in 1997 to address the issue of the limited long-term memory capacity of Recurrent Neural Networks (RNN).

Recently, Sepp Hochreiter released a paper on arXiv introducing a new architecture called xLSTM (Extended LSTM), claiming to solve the long-standing issue of LSTM being able to 'only process information in sequence,' thus 'challenging' the currently popular Transformer architecture.

According to the paper, Sepp Hochreiter incorporated an exponential gated recurrent network in the new xLSTM architecture, as well as introduced two memory rules 'sLSTM' and 'mLSTM' to the neural network structure, allowing related neural network structures to effectively utilize RAM and achieve parallelization operations similar to Transformer's 'simultaneous processing of all tokens.'

The team tested two models based on xLSTM and Transformer architectures using 15 billion tokens, and found that xLSTM performed the best, particularly excelling in 'language abilities.' Based on this, the researchers believe that xLSTM has the potential to 'compete' with Transformer in the future.

Reference:

xLSTM: Extended Long Short-Term Memory

Likes