Tencent Upgrades MixDiT Model and Open Sources it for Commercial Use

By:Nathan Published 2024-05-14T09:03:52Z

Thank you to TapTechNews user Xichuangjiushi for the tip! Tencent announced that its MixDiT model has been upgraded and open sourced. The model is now available on HuggingFace and Github, including model weights, inference code, model algorithms, etc., and can be freely used for commercial purposes by enterprises and individual developers. The upgraded MixDiT model adopts the same DiT architecture as Sora. Tencent stated that MixDiT is the first bilingual DiT architecture in Chinese and English. MixDiT is a text-to-image generation model based on Diffusiontransformer. This model has fine-grained understanding capabilities in both Chinese and English, allowing multi-turn conversations with users to generate and perfect images based on context. This is the first native Chinese DiT architecture open source model in the industry, supporting bilingual input and understanding in Chinese and English, with a parameter size of 1.5 billion. Running this model requires a CUDA-supported NVIDIA GPU. Running MixDiT alone requires a minimum of 11GB VRAM, while running DialogGen (Tencent's text-to-image multimodal interactive dialogue system) and MixDiT together requires at least 32GB VRAM. Tencent stated that they have tested NVIDIA's V100 and A100 GPUs on Linux. According to previous reports by TapTechNews, the first official 'Large Model Standard Compliance Evaluation' results in China were released, with Tencent's MixDiT model being the first batch of locally developed large models to pass the evaluation. The first batch of approved large models also includes Ali Tongyi Qianwen, 360 Zhinao, and Baidu Wenxin Yiyuan.

Tencent MixDiT open source bilingual DiT architecture text to image generation