UK Report AI Systems May Not Be as Safe as Thought

By:Nathan Published 2024-05-20T14:17:19Z

TapTechNews May 20th news, the UK government's affiliated Artificial Intelligence Safety Institute (AISI) today released a new report, revealing a fact worthy of attention - the current AI systems may not be as'safe' as the creators said.

The report pointed out that the four large language models participating in the test (TapTechNews note: the specific names of these models are not mentioned in the report) are 'extremely vulnerable to the influence of basic jailbreak attacks', and some models even actively generated 'harmful' content before being jailbroken.

UK Report AI Systems May Not Be as Safe as Thought_0

Currently, most publicly available language models have built-in some protection measures to prevent them from generating harmful or illegal content responses. And 'jailbreaking' means using technical means to 'deceive' the model to ignore the above measures.

The UK AI Safety Institute used recently standardized evaluated prompts and internally self-developed prompts for testing. The results showed that in the absence of attempting jailbreak, all models responded to at least some harmful questions; and after attempting a'relatively simple attack', all models responded to 98% to 100% of harmful questions.

The report pointed out that the safety measures taken by the current large language models on the market are still insufficient, and it is planned to conduct further tests on other models in the future.

Reference

AI safety UK language models jailbreak report