SenseTime Releases DayDayNew SenseNova 5.5 and Pioneering Models

TapTechNews July 5th news, SenseTime Technology released the DayDayNew SenseNova 5.5 large model system, and released the first domestic model of所见即所得 (what you see is what you get), DayDayNew 5o, and the interactive effect is benchmarked against GPT-4o.

 SenseTime Releases DayDayNew SenseNova 5.5 and Pioneering Models_0

By integrating cross-modal information, based on various forms such as sound, text, image, and video, DayDayNew 5o brings a brand-new AI interaction mode - real-time streaming multi-modal interaction.

According to the introduction, DayDayNew 5o can listen, can see, and is better at finding topics, just like talking with a real person, and this interaction mode is applicable to applications such as real-time conversation and speech recognition, can naturally handle multiple tasks in the same model, and adaptively adjust behaviors and outputs according to different contexts.

DayDayNew 5.5 is the first officially released streaming native multi-modal interaction model in China, and the model training is based on more than 10 terabytes of tokens of high-quality training data, including a large amount of high-quality artificially synthesized data, building a high-order thinking chain. The model adopts a hybrid end-cloud collaborative architecture and has 600 billion parameters, which can maximize the cloud-edge-terminal collaboration and achieve a reasoning speed of 109.5 words per second.

According to TapTechNews' previous report, SenseTime Technology also released the first controllable human video generation large model Vimi at the World Artificial Intelligence Conference. It can generate human video with the same target action by just one photo of any style, and supports multiple driving methods, and can be driven by various elements such as existing human videos, animations, sounds, text, etc.

Likes