markhachman OP t1_j4a1fyq wrote on January 14, 2023 at 5:17 AM

Reply to comment by Kafke in [D] Is MusicGPT a viable possibility? by markhachman

I think what I'm talking would be an algorithm that understands the sounds of different instruments, their tonality, rhythm, and so on, in much the same way ChatGPT understands the relationship between words or presumably Vall-E understands phonemes -- and then understands how to put them together in the style of various artists.

I'll have to check out Riffusion, though, as I'm unfamiliar with it, thanks.

Kafke t1_j4a1yik wrote on January 14, 2023 at 5:22 AM

Yes. Look at stable diffusion and riffusion for an example of this. Music isn't fundamentally different from images and text in terms of how modern AI works.

Ronny_Jotten t1_j4b5fqx wrote on January 14, 2023 at 1:29 PM

Images and text are already quite different from each other though, in terms of AI generators. The image generators include a language model, but work on a diffusion principle that the text generators don't use. Riffusion's approach of using a diffusion image generator with sonograms is interesting to some extent, but I sincerely doubt it will be the future direction of high-quality music generators.