StabilityAI Launches StableAudioOpen for High-Quality Audio Generation

By:Jack Published 2024-06-06T06:21:26Z

TapTechNews June 6th, StabilityAI, based on the text-to-image model StableDiffusion, further expands into the audio field and launches StableAudioOpen, which can generate high-quality audio samples based on user-entered prompts.

StabilityAI Launches StableAudioOpen for High-Quality Audio Generation_0

StableAudioOpen can create music up to 47 seconds long and is ideal for drumbeats, instrument melodies, ambient sounds, and onomatopoeic sound effects. This open-source model is based on the transforms Diffusion model (DiT), operating in the latent space of the autoencoder to enhance the quality and diversity of the generated audio.

StableAudioOpen is currently open source, and TapTechNews provides the relevant link. Interested users can try it out on HuggingFace. It is said to have been trained with over 486,000 samples from music libraries such as FreeSound and FreeMusicArchive.

StabilityAI says, Although it can generate short music clips, it is not suitable for complete songs, melodies, or vocals.

The difference between StableAudioOpen and StableAudio2.0 is that the former is an open-source model, focusing on short audio clips and sound effects, while the latter can generate full audio up to 3 minutes long.