Google DeepMind Announces 'Video-to-Audio' Technology

By:Mia Published 2024-06-18T14:30:28Z

TapTechNews June 18th news, according to Google's DeepMind press release, DeepMind recently announced a "video-to-audio" technology that uses AI to generate background music for silent videos.

TapTechNews learned that currently, this AI model of DeepMind still has limitations and requires developers to use prompt words to pre-introduce possible sounds of the video to the model and cannot directly add specific sound effects based on the video image for the time being.

It is reported that the model will first disassemble the video input by the user, and then combine with the user's text prompt, use the diffusion model to calculate repeatedly, and finally generate background sounds that are coordinated with the video image. For example, input a silent video of "walking in the dark", and then add text prompts such as "movies, horror movies, music, tension, footsteps on concrete", and the relevant model can generate background sound effects in a horror style.

Google DeepMind Announces Video-to-Audio' Technology_0

DeepMind also said that the "video-to-audio" model can generate an unlimited number of soundtracks for any video and can also judge the "positiveness" or "negativeness" of the generated audio through the content of the prompt words, so as to make the generated sound closer to some specific scenes.

Looking to the future, DeepMind said that researchers are further optimizing this "video-to-audio" model and plan to be able to allow the model to generate video background music directly according to the video content without the need of prompt words in the future, and will also improve the lip-sync ability of character dialogues in the video.

DeepMind AI video background music