Microsoft Releases Phi-3.5 Series of AI Models with Breakthroughs

By:Mia Published 2024-08-21T07:27:00Z

TapTechNews August 21st news, Microsoft today released the Phi-3.5 series of AI models, among which the most notable is the launch of the first Mixture of Experts (MoE) version Phi-3.5-MoE in this series.

Microsoft Releases Phi-3.5 Series of AI Models with Breakthroughs_0

The Phi-3.5 series released this time includes three lightweight AI models Phi-3.5-MoE, Phi-3.5-vision and Phi-3.5-mini, which are built based on synthetic data and filtered public websites, with a context window of 128K. All models can now be obtained in the form of MIT license on HuggingFace. TapTechNews attaches the relevant introduction as follows:

Phi-3.5-MoE: The first Mixture of Experts model

Phi-3.5-MoE is the first model in the Phi series to utilize Mixture of Experts (MoE) technology. This model activates only 6.6 billion parameters in a 16x3.8B MoE model using 2 experts and is trained on 4.9T tokens using 512 H100s.

The Microsoft research team designed this model from scratch to further improve its performance. In the standard artificial intelligence benchmark test, the performance of Phi-3.5-MoE exceeds that of Llama-3.18B, Gemma-2-9B and Gemini-1.5-Flash, and is close to the current leader GPT-4o-mini.

Phi-3.5-vision: Enhanced multi-frame image understanding

Phi-3.5-vision has a total of 4.2 billion parameters, is trained on 500B tokens using 256 A100GPUs, and now supports multi-frame image understanding and reasoning.

Phi-3.5-vision has improved performance on MMMU (from 40.2 to 43.0), MMBench (from 80.5 to 81.9) and the document understanding benchmark TextVQA (from 70.9 to 72.0).

Phi-3.5-mini: Lightweight and powerful

Phi-3.5-mini is a 3.8 billion parameter model, which exceeds Llama3.18B and Mistral7B, and can even be comparable to MistralNeMo12B.

This model is trained on 3.4T tokens using 512 H100s. This model has only 3.8B effective parameters and is highly competitive in multilingual tasks compared to LLMs with more effective parameters.

In addition, Phi-3.5-mini now supports a 128K context window, while its main competitor, the Gemma-2 series, only supports 8K.

Microsoft Phi 3.5 AI models MoE