Mianbi Intelligent's New MiniCPM-Llama3-V2.5 Superior Multimodal Model

By:Nathan Published 2024-05-21T08:11:14Z

TapTechNews May 21st news, Mianbi Intelligent launched and open-sourced the latest edge-side multimodal model MiniCPM-Llama3-V2.5 last night , which supports more than 30 languages and claims to achieve:

The strongest edge-side multimodal comprehensive performance: surpassing GeminiPro and GPT-4V

State-of-the-Art OCR ability (TapTechNews note: State-of-the-Art): 9 times clearer pixels, accurate recognition of difficult images, long images and long texts

Image encoding is 150 times faster: the first edge-side system-level multimodal acceleration

Mianbi Intelligents New MiniCPM-Llama3-V2.5 Superior Multimodal Model_0

The total number of parameters of MiniCPM-Llama3-V2.5 is 8B. The multimodal comprehensive performance surpasses commercial closed-source models such as GPT-4V-1106, GeminiPro, Claude3, and Qwen-VL-Max. The OCR ability and instruction following ability have been further improved, and it can accurately recognize difficult images, long images, and long texts, and supports multimodal interaction in more than 30 languages.

Mianbi Intelligents New MiniCPM-Llama3-V2.5 Superior Multimodal Model_1

In the OpenCompass test, MiniCPM-Llama3-V2.5 comprehensive performance surpasses the multimodal behemoth GPT-4V and GeminiPro; on the OCRBench, it surpasses benchmark models such as GPT-4o, GPT-4V, Claude3VOpus, and GeminiPro.

Mianbi Intelligents New MiniCPM-Llama3-V2.5 Superior Multimodal Model_2

Mianbi Intelligents New MiniCPM-Llama3-V2.5 Superior Multimodal Model_3

Mianbi Intelligents New MiniCPM-Llama3-V2.5 Superior Multimodal Model_4

Mianbi Intelligents New MiniCPM-Llama3-V2.5 Superior Multimodal Model_5

In addition, in terms of image encoding, MiniCPM-Llama3-V2.5 integrates the NPU and CPU acceleration framework for the first time, and combines memory management and compilation optimization techniques to achieve a 150-fold acceleration improvement.

In terms of language model inference, through optimization methods such as CPU, compilation optimization, and memory management, the language decoding speed of MiniCPM-Llama3-V2.5 on the mobile phone side has increased from about 0.5 token/s of Llama3 to 3-4 token/s, and can support more than 30 languages, including mainstream languages such as German, French, Spanish, Italian, and Russian, basically covering the countries alo ng the Belt and Road Initiative.

Mianbi Intelligents New MiniCPM-Llama3-V2.5 Superior Multimodal Model_6

In terms of OCR technology, MiniCPM-Llama3-V2.5 realizes efficient coding and lossless recognition of 1.8 million high-definition pixel images, and supports images with any aspect ratio, even the 1:9 extreme ratio image.