A few days ago, Google shared with the world a new generative AI model the tech titan has been working on for some time, alongside OpenAI. The new AI generative software is called MusicLM and is able to create musical audio starting from text prompts, declining it into different styles, and applying an array of different nuances to the output.
The question arising, every time we are confronted with AI generative tools (which is roughly once every 15 mins these days), is pretty much always the same: how is this even possible? Will musicians too be put out of business by this technology? What now?
Let’s get to the meat. Google's Music Language Model, MusicLM for short, is a cutting-edge artificial intelligence tool that generates music in a variety of styles, based on machine learning algorithms. This tool is part of Google's ongoing strive to develop AI technologies able to perform tasks that were previously a human prerogative, such as creating music.
MusicLM uses deep learning techniques to analyze a large dataset of existing musical compositions and extract patterns and relationships between musical elements, such as harmony, melody, and rhythm. It then uses this information to generate new pieces of music taking into account the data in the prompt. This means that it can generate pieces of music reproducing a particular style or era, or mimicking certain emotional qualities, such as sadness or happiness.
MusicLM uses an AI model trained on a massive corpus of unlabeled music, as stated by Google, combined with MusicCaps, a new dataset of over 5,500 music-text pairs.
But that’s not all, and by this point we can all agree that Shazam walked so that MusicLM could run: MusicLM is also capable of turning a human humming or whistling audio input into music, adapting it to match the style requested in a prompt (!).
Different musical instruments, different genres, different styles and periods, and different experience levels: MusicLM can master any of the above and more. Google also released a paper explaining the technology and methodology at the core of MusicLM.
According to the description page, which also features impressive 30 seconds audio snippets, MusicLM is far better in performance compared with previous AI music software, in both the audio quality and the coherence with the text input.
Browsing the MusicLM demo page, the tool is described as “a model generating high-fidelity music from text descriptions”: Music LM will create a piece of music elaborating rich captions describing the feel of the music, and even vocals. What sets MusicLM apart from other AI music generation tools is its ability to generate music that is not only structurally similar to existing pieces but also musically coherent and expressive. This means that the music generated by MusicLM can be used in real-world musical contexts and has the potential to inspire new musical ideas and compositions.
MusicLM is definitely pushing the boundaries of what's possible in the world of music generation and opening up exciting new possibilities for music makers and listeners alike. The only thing that the AI tool isn’t capable of reproducing is the key human-exclusive ability to bring feelings and personal experiences into the creative process composition. Music is not just a collection of sounds and notes, and the compelling urge and imagination that go into making music are what sets humans apart from AI algorithms.
While these algorithms can generate impressive pieces of music that are similar in style and structure to existing pieces, they are inherently unequipped to capture the emotional depth that comes from human creativity. In the end, this reminds us that musical expression is not just melody and style but carries so much more: personal experiences, historical context, mood, state of mind, and glimpses of the moment in life in which the music was created. MusicLM and other AI algorithms cannot bring this into their music, no matter how exceptional the “rich caption” is.