• Hazel@lemmy.blahaj.zone
    link
    fedilink
    English
    arrow-up
    65
    ·
    5 months ago

    These should be illegal. It’s just a way of outsourcing AI training to the general public for free.

    • A_norny_mousse@feddit.org
      link
      fedilink
      English
      arrow-up
      3
      ·
      edit-2
      5 months ago

      They really should. If you applied all the logic of, say, food labeling laws in the EU to the internet, we’d have very different laws around it today.

      But somebody shit into clueless politicians brains and told them it’s different because it’s the internet.

      Hmm, actually it is different - as in more difficult legally - because it’s global, but that’s no excuse to do nothing about it. The software would’ve been up to it even in the early days.

  • konalt@lemmy.world
    link
    fedilink
    English
    arrow-up
    30
    ·
    5 months ago

    I did one of these a few weeks ago. You had to do 20 in a row and I got about halfway through before realizing the symbols at the back actually meant something. I literally didn’t notice the symbol on the left side.

  • Raltoid@lemmy.world
    link
    fedilink
    English
    arrow-up
    22
    arrow-down
    1
    ·
    5 months ago

    That’s literally not a captcha, that’s “AI” training. You could probably input anything.

    • ChaoticNeutralCzech@feddit.org
      link
      fedilink
      English
      arrow-up
      6
      ·
      edit-2
      4 months ago

      They can produce an unlimited number of CGI challenges and know what is correct. Collecting the AI training data from users only makes sense for classifying images from the real world. Even then, Google’s reCaptcha checks if you’re consistent with other users so you’re unlikely to pass with a random answer.

      • Raltoid@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        4 months ago

        They can produce an unlimited number of CGI challenges and know what is correct so collecting AI training data only makes sense for classifying images from the real world.

        In some cases they’re testing/training for the most common solutions human use for a problem with multiple paths and choices.

        It’s part of trying to make them seem more human-like and as if they have general intelligence. And not just give the optimal and computer calculated solution, or the solution one or a few programmers think is the common solution. It needs data.

        And if it’s one of those who actually check, but has multiple paths, do the convoluted one(or just refresh).

    • CanadaPlus@lemmy.sdf.org
      link
      fedilink
      English
      arrow-up
      7
      ·
      5 months ago

      It was always a kind of unfair test, when you consider words are rendered down to a token before the thing ever sees them.

        • Vigge93@lemmy.world
          link
          fedilink
          English
          arrow-up
          7
          arrow-down
          1
          ·
          5 months ago

          Each word gets converted to a number before it is processed, so asking how many “how many r are there in strawberry” could be converted to “how many 7 are there in 13”, for example.

          (Very simplified)

          • A_norny_mousse@feddit.org
            link
            fedilink
            English
            arrow-up
            1
            ·
            5 months ago

            But then the AI just looks up the definition of 13, and the definition of 7, and should be able to answer anyhow. I mean, this is how computers work. Are you sure that’s what the other commenter was refering to?

            • CanadaPlus@lemmy.sdf.org
              link
              fedilink
              English
              arrow-up
              3
              ·
              edit-2
              5 months ago

              It’s not how AIs specifically work. They’re pretty brain-like, and learn through their experiences during the training process. (Which is also why they’re so hard to consistently control)

              It’s possible they still might be able to learn this spelling fact from some bit of their training data, somehow, but they’re at an immense disadvantage.

            • Vigge93@lemmy.world
              link
              fedilink
              English
              arrow-up
              3
              arrow-down
              1
              ·
              5 months ago

              That’s when you get into more of the nuance with tokenization. It’s not a simple lookup table, and the AI does not have access to the original definitions of the tokens. Also, tokens do not map 1:1 onto words, and a word might be broken into several tokens. For example “There’s” might be broken into “There” + “'s”, and “strawberry” might be broken into “straw” + “berry”.

              The reason we often simplify it as token = words is that it is the case for most of the common words.

  • Match!!@pawb.social
    link
    fedilink
    English
    arrow-up
    17
    ·
    5 months ago

    use captchas to train AI

    have to make increasingly sophisticated captchas

    surprised pikachu species

  • webghost0101@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    13
    ·
    5 months ago

    What even is this? Whatever is beyond that cannot be worth it.

    Its like the riddles of ancient mythology but the reward is yet another website

  • slippyferret@lemmy.blahaj.zone
    link
    fedilink
    English
    arrow-up
    10
    ·
    edit-2
    5 months ago

    I’m so used to seeing difficult to read text in captchas that my brain didn’t even register the giant “79” at first and started with the squished characters. Guys, I think my model is overfitted…

  • Mwa@thelemmy.club
    link
    fedilink
    English
    arrow-up
    6
    ·
    5 months ago

    I wonder what captcha software they use that produces these hard captchas