Microsoft's Phi-3-vision A New Breakthrough in the SLM Field

By:Jack Published 2024-05-28T00:46:12Z

TapTechNews May 28th news, Microsoft released the latest member of the Phi-3 family - Phi-3-vision at the Build 2024 conference, focusing on 'visual capabilities', being able to understand graphic and text content, and reportedly can run smoothly and efficiently on mobile platforms.

Microsofts Phi-3-vision A New Breakthrough in the SLM Field_0

Phi-3-vision is a multimodal Small Language Model (SLM), mainly used in local AI scenarios. The model has a parameter count of 4.2 billion and a context length of 128k tokens, which can provide support for regular visual reasoning tasks and other tasks.

So how powerful is Phi-3-vision? Microsoft released a new paper [PDF] today, indicating that this SLM is on par with other models such as Claude3-haiku and Gemini1.0Pro.

Microsofts Phi-3-vision A New Breakthrough in the SLM Field_1

Microsoft compared models such as ScienceQA, MathVista and ChartQA in the paper, and although Phi-3-vision has not many parameters, its performance is extremely excellent.

TapTechNews previously reported that Microsoft provided a comparison chart of Phi-3-vision compared to competing models such as ByteDance's Llama3-Llava-Next (8B), LlaVA-1.6 (7B) in cooperation with Microsoft Research and the University of Wisconsin and Columbia University, and Alibaba's Tongyi Qianwen QWEN-VL-Chat model, which shows that the Phi-3-vision model performs outstandingly in multiple projects.

Microsofts Phi-3-vision A New Breakthrough in the SLM Field_2

Currently, Microsoft has uploaded this model to HuggingFace, and interested friends can visit the project address: click here.