Apple Releases Paper on AppleIntelligence Model with Prominent Performances

By:Lucas Published 2024-07-31T02:04:54Z

TapTechNews July 31st news, Apple has newly released a paper [PDF], sharing relevant details about the AppleIntelligence model, and some performances have already exceeded that of OpenAI's GPT-4.

Model Introduction

Apple introduced the AppleFoundationModel (hereinafter referred to as AFM) in the paper, and there are the following two models:

AFM-on-device: Runs locally, with 3 billion parameters, and can operate efficiently on devices such as iPhone and iPad;

AFM-server: Apple has not disclosed parameters and other details yet.

Training Data Sources

Apple said the training data set consists of authorized data obtained from publishers, curated public or open-source data sets, and public information crawled by our web crawler Applebot.

Apple emphasizes focusing on protecting user privacy, and the data mixture does not include private data of Apple users.

According to The New York Times report, Apple reached a multi-year agreement worth at least 50 million US dollars with several publishers such as NBC, Condé Nast, and IAC at the end of 2023 to train the model in the publishers' news archives.

Apple's AFM model has also been trained on open-source code hosted on GitHub, especially Swift, Python, C, Objective-C, C++, JavaScript, Java, and Go code.

The paper stated that in order to improve the mathematical skills of the AFM model, Apple specifically added mathematical problems and answers from web pages, math forums, blogs, tutorials, and seminars to the training set.

Apple utilized high-quality, publicly available data sets (names not mentioned in the paper) that have licenses that allow for training... models and were filtered to remove sensitive information.

The training data set of the AFM model contains approximately 6.3 trillion tokens (a token is a small piece of data and is usually more easily absorbed by a generative AI model). In contrast, this is less than half the number of tokens (15 trillion) used by Meta to train its flagship text generation model Llama3.1405B.

Training Hardware

According to the paper description, Apple uses 8192 TPUv4 chips to train the AFM-server model; 2048 TPUv5p chips to train the AFM-on-device model.

Apple Releases Paper on AppleIntelligence Model with Prominent Performances_0

Each v5p pod consists of 8960 chips, and the floating-point operations (FLOPS) and memory per second are twice and three times that of TPUv4, and the training speed of the model is nearly three times faster.

Apple Releases Paper on AppleIntelligence Model with Prominent Performances_1

Model Performance

According to the paper description, Apple's self-developed large model outperforms GPT-4 in instruction following, text summarization.

Apple's data shows that the harmful output violation rate of AFM-server is 6.3%, significantly lower than 28.8% of GPT-4. Similarly, on the device, the 7.5% violation rate of AFM is lower than the 21.8% score of Llama-3-8B (trained by Meta, the parent company of Facebook).

In terms of email, message, and notification summarization, the satisfaction rates of AFM on the device are 71.3%, 63%, and 74.9% respectively. The research paper also points out that these three models are ahead of Llama, Gemma, and Phi-3 models respectively. TapTechNews attaches the relevant performance results as follows:

Apple Releases Paper on AppleIntelligence Model with Prominent Performances_2

Apple Releases Paper on AppleIntelligence Model with Prominent Performances_3

Apple Releases Paper on AppleIntelligence Model with Prominent Performances_4

Apple Releases Paper on AppleIntelligence Model with Prominent Performances_5

Related Readings:

《Only for non-domestic models: Apple iPhone 15Pro and ProMax users can access AppleIntelligence》

《Apple: Once used Google hardware to train the AppleIntelligence model》

《Apple's AI version of iOS is extremely popular on the first day: Chatting becomes highly emotional, the large model becomes the strongest mouthpiece, and Siri transforms magnificently》

Apple AppleIntelligence model performance