Mianbi Intelligent Open-sources MiniCPM-V2.6 Model with Impressive Features

TapTechNews August 7th news, Mianbi Intelligent open-sourced the MiniCPM-V2.6 model yesterday, and the official stated that it would raise the end-side AI multi-modal capability to a level that fully benchmarks against GPT-4V.

Mianbi Intelligent Open-sources MiniCPM-V2.6 Model with Impressive Features_0

The official said that the MiniCPM-V2.6 model with only 8B parameters has achieved 3SOTA results in single-image, multi-image, and video understanding under 20B, and has the following characteristics:

The three-in-one strongest end-side multi-modal: For the first time on the end-side, it realizes single-image, multi-image, video understanding and other multi-modal core capabilities that fully surpass GPT-4V, and single-image understanding越级compares to the multi-modal king Gemini1.5Pro and the new top stream GPT-4omini.

Many functions are on the upper end for the first time: real-time video understanding, multi-image joint understanding, multi-image ICL visual analog learning, multi-image OCR, etc.

The highest multi-modal pixel density: Analogous to knowledge density, the small cannon 2.6 has achieved twice the single-token encoded pixel density (tokendensity) of GPT-4o.

End-side friendly: After quantization, the end-side 6GB memory is available; the end-side inference speed reaches 18 tokens/s, which is 33% faster than the previous-generation model. It supports llama.cpp, ollama, and vllm inference immediately after release; and supports multiple languages.

The unified high-definition framework: The traditional advantage of the small cannon, OCR ability, continues its SOTA performance level and further covers single-image, multi-image, and video understanding.

TapTechNews attached the open-source address:

GitHub: https://github.com/OpenBMB/MiniCPM-V

HuggingFace: https://huggingface.co/openbmb/MiniCPM-V-2_6

Likes