Convert Text to Audio Clips with A New AI-Based System
A new AI-based technology called AudioLDM from Surrey allows users to input a text prompt that is then used to generate an analogous audio clip.
Without affecting sound quality or the ability of users to manipulate clips, the system has the potential to analyze prompts and offer clips with less computational power than current AI systems.
The general public can test out AudioLDM by going to its Hugging Face space. Furthermore, their code has over 1,000 stars on Github and is open-sourced.
Sound designers might use such a system in a variety of contexts, including video production, digital creation, the metaverse, game design, virtual reality, and assistive technology for the blind.
The new AI-based system helps convert text to audio clips
“Generative AI can potentially transform every industry, including music and sound creation,” claims Haohe Liu, Study Project Lead at the University of Surrey.
By using AudioLDM, we demonstrate that anyone can produce high-quality, original samples, in a matter of seconds using very little computing power.
He added, “Although there are some valid worries about the technology, AI will undoubtedly open doors for many people working in these creative industries and spark an explosion of new ideas.”
The Contrastive Language-Audio Pretraining (CLAP) method is used to develop Surrey’s open-sourced model in a semi-supervised manner.
In the absence of text labeling, AudioLDM could be trained on vast volumes of audio data with the use of the CLAP approach, greatly improving model capacity.