Practice time of two and a half years, MusicGen beats Alphabet-C MusicLM?
From AI Stefanie Sun to MusicGen, from singing to creation, AI has invaded the music field. Recently, Meta has opened the AI model MusicGen on Github, which has attracted the attention of the outside world. As the name suggests, **MusicGen is that there is a music generation AI model that can create music based on text and melodic cues. * *! The model is based on the Transformer model introduced by Alphabet-C in 2017, and uses Meta's EnCodec compiler to decompose audio data into small units for processing. Late last week, Felix Kreuk, a research engineer at Meta Artificial Intelligence, demonstrated MusicGen capabilities on the Twitter.! In the video demonstration, MusicGen adapted two classical music fragments into pop music and modern hip-hop music of the 1980 s respectively. It also added musical instruments, electronic sounds and other elements according to the prompts.! According to Meta, MusicGen received 20000 hours of music training, which is equivalent to 833 days of training without eating, drinking or sleeping. The model also used 10000 "high-quality" licensed music and 390000 pure music from media content services ShutterStock and Pond5. So, is a two-and-a-half-year-long MusicGen qualified for a debut? Meta himself compared MusicGen to existing music creation software MusicLM (owned by Alphabet-C), Riffusion, and Mousai. In this regard, Wall Street has selected three examples:> Tip 1. Create a popular dance music with a catchy melody and tropical percussion elements. The rhythm is cheerful and suitable for beach scenes.>> ! > > MusicGen > > ! > > MusicLM > > ! > > Riffusion > > ! > > Mousai > > tip 2. compose a magnificent pipeline symphony, add thunderous percussion, poetic brass and high strings to create a movie background music suitable for hero fighting.>> ! > > MusicGen > > ! > > MusicLM > > ! > > Riffusion > > ! > > Mousai > > Tip 3. Create a classic reggae music, add electric guitar solo.>> ! > > MusicGen > > ! > > MusicLM > > ! > > Riffusion > > ! > > Mousai is not difficult to hear, MusicGen performance seems to be more amazing. According to Meta, **MusicGen performed better among the four in terms of matching the text and the credibility of the composition. * * In order to verify whether the MusicGen is really so good, Kyle, a Techcrunch reporter from science and technology media, Wiggers personally tried MusicGen and MusicLM and compared the works of two AI musicians. I have to say that (MusicGen) doesn't cost human musicians their jobs, but it makes pretty good music, at least for basic cues like "ambient music," and to my ears, **it's comparable (if not slightly better) to the Alphabet-C AI music generator MusicLM. * * Wiggers first threw out a simple prompt: jazz, elevator music. MusicGen and MusicLM came up with the following pieces:! MusicGen! MusicLM Then, Wiggers increase the difficulty of the exam, AI creates a low-fidelity, slow-tempo Electro Chill (a music style that combines electronic music and a relaxed atmosphere) music, requiring the use of natural and authentic sounds. The two models work as follows:! MusicGen! MusicLM for the second tip, Wiggers found that MusicGen surprisingly outperformed MusicLM in musical coherence, and its work was easily sourced on Lofi Girl, a music all-day live channel on YouTube. Finally, Wiggers tried to get MusicGen and MusicLM to compose piano ditches in the style of well-known composer George Gershwin. He found that Alphabet-C embedded a filter in the public version of the MusicLM that prevented users from prompting specific artists to protect author copyright. MusicGen, by contrast, had no such filters, and ended up creating the so-called George Gershwin-style piano pieces. But in Wiggers's opinion, this piece of music is not so good.! It is worth mentioning that there are many text, voice, picture and even video generation models on the market, but there are very few high-quality music generation models. **According to research documents found in the online scientific preprint repository arXiv, one of the main challenges of music generation is the need to run the full spectrum, which requires more intensive sampling, not to mention the complex structure of the replica music and the coordination of the instruments. Whether MusicGen can become an excellent music generation model remains to be verified by more users. Users can experience MusicGen through Face-Hugging APIs, but generating music can take some time, depending on the number of users online at the same time. Currently, Meta does not provide code for training models, but does provide pre-trained models.