Well, this is gonna get weird.

  • solaranus
    ·
    edit-2
    1 year ago

    deleted by creator

  • UlyssesT [he/him]
    ·
    edit-2
    2 years ago

    Any moment now: "musicians are just lazy and entitled labor aristocrats that don't do real work. I like these cheaper treats. I got mine. If you're unhappy about how this screws musicians in an exploitative capitalist system, you're a Luddite and emotional." :very-intelligent:

    • solaranus
      ·
      edit-2
      1 year ago

      deleted by creator

  • Abraxiel
    ·
    2 years ago

    Once upon a time, music was a fully social activity about mass participation.

  • Simferopol [none/use name]
    ·
    edit-2
    2 years ago

    :cat-vibing:

    the hip hop one is like what English sounds like to non-English speakers

  • iridaniotter [she/her, it/its]
    ·
    2 years ago

    By relying on pretrained and frozen MuLan, we need audioonly data for training the other components of MusicLM. We train SoundStream and w2v-BERT on the Free Music Archive (FMA) dataset (Defferrard et al., 2017), whereas the tokenizers and the autoregressive models for the semantic and acoustic modeling stages are trained on a dataset containing five million audio clips, amounting to 280k hours of music at 24 kHz. Each of the stages is trained with multiple passes over the training data. We use 30 and 10-second random crops of the target audio for the semantic stage and the acoustic stage, respectively. The AudioLM fine acoustic modeling stage is trained on 3-second crops

    Not sure what the five million audio clips part is referring to. But they probably actually did this legally?

    • Antoine_St_Hexubeary [none/use name]
      ·
      edit-2
      2 years ago

      If this thing is still in the "academic research with no way of generating revenue and no way for the public to get involved" stage, wouldn't it qualify as fair use?

  • iridaniotter [she/her, it/its]
    ·
    edit-2
    2 years ago

    OH NO THEY AUTOMATED HATSUNE MIKU :commiku:

    The vocals are the most obvious weakpoint. Synthesis of human speech and singing is still far off, and this program doesn't write lyrics and instead just smashes phonemes together in a manner far less eloquent than Simlish...