Tencent Releases Hybrid DiT Acceleration Library

On June 6, TapTechNews reported that Tencent has today released an acceleration library for Tencent's Hybrid Text-to-Image Open Source Large Model (Hybrid DiT), claiming a significant improvement in reasoning efficiency and a 75% reduction in image generation time.

The official said that the usage threshold of the Hybrid DiT model has also been greatly reduced. Users can use the graphic interface of ComfyUI to access the text-to-image generation capabilities of the Tencent Hybrid DiT model. At the same time, the Hybrid DiT model has been deployed in the HuggingFace Diffusers general model library. Users can call the Hybrid DiT model with just three lines of code without downloading the original code library.

Tencent Releases Hybrid DiT Acceleration Library_0

Before this, Tencent had announced that the Hybrid Text-to-Image Large Model had been fully upgraded and open-sourced for free commercial use by enterprises and individual developers. Tencent claims that it is the industry's first Chinese-native DiT architecture text-to-image open-source model that supports Chinese and English bilingual input and understanding. It adopts the same DiT architecture as sora and can not only support text-to-image generation but also serve as the basis for multimodal visual generation such as videos.

To run this model, an NVIDIA GPU that supports CUDA is required. The minimum video memory required to run Hybrid DiT alone is 11GB, and running DialogGen (a text-to-image multimodal interactive dialogue system launched by Tencent) and Hybrid DiT together requires at least 32GB of video memory. Tencent said that they have tested NVIDIA's V100 and A100 GPUs on Linux.

TapTechNews Attachment link: Code (GitHub) Click here to go to the Model (HuggingFace) Click here to go.

Likes