Microsoft Announces New Small Language AI Model Phi-3-vision

TapTechNews May 26th news, Microsoft recently announced the latest member of its Small Language AI Model family (SLM), Phi-3-vision. This model focuses on visual capabilities and can understand graphic and text content, and it is also said to be able to run smoothly and efficiently on mobile platforms.

It is introduced that Phi-3-vision is the first multi-modal model of the Microsoft Phi-3 family. The text understanding ability of this model is based on Phi-3-mini, and it also has the lightweight feature of Phi-3-mini and can run in mobile platforms/embedded terminals. The number of parameters of this model is 4.2 billion, which is greater than that of Phi-3-mini (3.8B) but less than that of Phi-3-small (7B), and the context length is 128k tokens, and the training period is from February to April 2024.

Microsoft Announces New Small Language AI Model Phi-3-vision_0

TapTechNews noticed that the biggest feature of the Phi-3-vision model is just as its name suggests, mainly supporting the graphic and text recognition ability, and it is claimed to be able to understand the meaning of pictures in the real world and can also quickly identify and extract the text in pictures.

Microsoft said that Phi-3-vision is especially suitable for office occasions. The developers have specially optimized the model's understanding ability in recognizing charts and block diagrams. It is claimed that it can make inferences using the information input by users and can also make a series of conclusions to provide strategic advice for enterprises, claiming that the effect is comparable to that of large models.

In terms of model training, Microsoft claims that Phi-3-vision is trained from a variety of types of picture and text data, including a series of strictly selected public contents, such as textbook-level educational materials, codes, graphic and text annotation data, real-world knowledge, chart pictures, chat formats, etc., so as to ensure the diversity of the input content of the model. To ensure privacy, Microsoft claims that the training data they use is traceable and does not contain any personal information.

Regarding performance, Microsoft provided a comparison chart of Phi-3-vision compared to competing models such as ByteDance's Llama3-Llava-Next (8B), LlaVA-1.6 (7B) jointly by Microsoft Research and the University of Wisconsin and Columbia University, and Alibaba Tongyi Qianwen QWEN-VL-Chat model, which shows that the Phi-3-vision model performs excellently in multiple projects.

Microsoft Announces New Small Language AI Model Phi-3-vision_1

Currently, Microsoft has uploaded the model to HuggingFace. Interested friends can visit the project address: Click here to enter

Related Readings:

Intel bets strongly on SLM small language AI models and announces that its software and hardware have been adapted to Microsoft Phi-3

Likes