Microsoft's Phi-3-vision A New Breakthrough in the SLM Field

TapTechNews May 28th news, Microsoft released the latest member of the Phi-3 family - Phi-3-vision at the Build 2024 conference, focusing on 'visual capabilities', being able to understand graphic and text content, and reportedly can run smoothly and efficiently on mobile platforms.

 Microsofts Phi-3-vision A New Breakthrough in the SLM Field_0

Phi-3-vision is a multimodal Small Language Model (SLM), mainly used in local AI scenarios. The model has a parameter count of 4.2 billion and a context length of 128k tokens, which can provide support for regular visual reasoning tasks and other tasks.

So how powerful is Phi-3-vision? Microsoft released a new paper [PDF] today, indicating that this SLM is on par with other models such as Claude3-haiku and Gemini1.0Pro.

 Microsofts Phi-3-vision A New Breakthrough in the SLM Field_1

Microsoft compared models such as ScienceQA, MathVista and ChartQA in the paper, and although Phi-3-vision has not many parameters, its performance is extremely excellent.

TapTechNews previously reported that Microsoft provided a comparison chart of Phi-3-vision compared to competing models such as ByteDance's Llama3-Llava-Next (8B), LlaVA-1.6 (7B) in cooperation with Microsoft Research and the University of Wisconsin and Columbia University, and Alibaba's Tongyi Qianwen QWEN-VL-Chat model, which shows that the Phi-3-vision model performs outstandingly in multiple projects.

 Microsofts Phi-3-vision A New Breakthrough in the SLM Field_2

Currently, Microsoft has uploaded this model to HuggingFace, and interested friends can visit the project address: click here.

Related reading:

Intel bets on SLM small language AI model and announces that its software and hardware have been adapted to Microsoft's Phi-3

Parameter count of 4.2 billion, Microsoft announces the latest member of SLM small language AI model Phi-3-vision

Likes