https://x.com/babadookspinoza/status/1802442068125516148

  • Tabitha ☢️[she/her]
    ·
    5 months ago

    I wonder if you could engineer an unrelated image to have the same md5 or perceptual hash and get it to auto-debunk.

    • TrudeauCastroson [he/him]
      ·
      5 months ago

      If you have access to a quantum computer you could do this easily. With current computing it's hard.

      • git [he/him, comrade/them]
        ·
        5 months ago

        This was a form of attack against Apple's on-device CSAM detection that they scrapped, so it's been possible for a while.

        • Neural hash collider: https://github.com/anishathalye/neural-hash-collider
        • Example collision: https://github.com/AsuharietYgvar/AppleNeuralHash2ONNX/issues/1
        • Script to generate collisions: https://gist.github.com/unrealwill/c480371c3a4bf3abb29856c29197c0be
        • Tainting the client side CSAM database: https://blog.xot.nl/2023/10/11/tainting-the-csam-client-side-scanning-database/index.html
        • TrudeauCastroson [he/him]
          ·
          edit-2
          5 months ago

          Edit: wow I didn't realize md5 matching a picture was that easy, looks like you can make any image look enough like that twitter-deboonked one to generate a fake match. How has no one done this yet.

          Thanks for the links, it's pretty interesting stuff I haven't kept up with for a while.

          I didn't hear about that potential apple attack, I wonder if you could generate a collision with a pic that looks close enough to the twitter image they auto-deboonk and a pic that's completely unrelated, got twitter to add your new similar image to the auto-deboonker, and then troll on twitter by posting the unrelated image.

          That'd be similar to that apple attack you linked, but it depends on how twitter auto-deboonking works and how easy you could get them to add a similar-but-different pic to their deboonker database.

        • TrudeauCastroson [he/him]
          ·
          5 months ago

          I thought md5 is vulnerable to generating 2 colliding files, not to trying to generate a match to an existing file.

          • bloubz@lemmygrad.ml
            ·
            5 months ago

            It's definitely the easiest. But that's why we stopped using it, because it's proven we can have collision so it may be possible to generate a match on a real life file. I'm not sure about where we're at on this in research (if there's any)

            Also I was actually not trying to make a point, just pointing on md5 as a joke

            • TrudeauCastroson [he/him]
              ·
              5 months ago

              If you're using any hash smaller than your file (not just md5), then it's always possible to have 2 different files that match. This is just from pigeonhole principle. No matter what you use there will be collision.

              md5 is just bad because it's small so it's easier to generate this match. It's also a question of how easy is it to reverse engineer a match, which apparently md5 is worse for on pictures than I expected.