AI Large Language Models' Performance in Tracking Mental States

By:Jacob Published 2024-05-25T15:12:01Z

TapTechNews May 25th news, in the latest issue of the journal Nature - Human Behavior in the latter half of this month, a research paper on AI was published, which mentioned that in the task of testing the ability to track the mental state of others, two types of AI large language models, under certain circumstances, have performances similar to or even exceeding that of humans.

AI Large Language Models' Performance in Tracking Mental States_0

As the key to human communication and empathy, the ability of mental state (also known as theory of mind) is very important for human social interaction. The first author of this paper - James W.A. Strachan from the University Medical Center Hamburg-Eppendorf in Germany, along with colleagues and collaborators, selected tasks that can test different aspects of the theory of mind, including detecting false beliefs, understanding indirect speech, and recognizing rudeness, etc.

TapTechNews note: The team chose the GPT and LLaMA2 models for the experiment and compared them with 1907 people.

The results showed that the GPT model can reach and sometimes even exceed the average human level in recognizing indirect requests, false beliefs and misleading, while the performance of LLaMA2 is inferior to the human level; in recognizing rudeness, LLaMA2 is stronger than humans, but the GPT performs poorly.

According to China News Service, the author said that the success of LLaMA2 is proved to be because the degree of 'bias' in answering is relatively low, not really sensitive to rudeness, and the 'poor performance' of GPT is because of the 'ultra-conservative' attitude towards insisting on conclusions, not reasoning errors.

TapTechNews attached the paper address: 'Testing theory of mind in large language models and humans'

AI mental state theory of mind performance