• krigo666@lemmy.world
    link
    fedilink
    English
    arrow-up
    123
    arrow-down
    4
    ·
    1 day ago

    Laws should be passed in all countries that AI crawlers should request permission before crawling whatever target site. I haver no pity to AI “thiefs” that get their models poisoned. F…ing plague, wasn’t enough the adware and spyware…

    • chrash0@lemmy.world
      link
      fedilink
      English
      arrow-up
      19
      ·
      1 day ago

      i doubt the recent uptick in traffic is from “stealing data” for training but rather from agents scraping them for context, eg Edge Copilot, Google’s AI search, SearchGPT, etc.

      poisoning the data will likely not help in this situation since there’s a human on the other side that will just do the same search again given unsatisfactory results. like how retries and timeouts can cause huge outages for web scale companies, poisoning search results will likely cause this type of traffic to increase and further increase the chances of DoS and higher bandwidth usage.

    • catloaf@lemm.ee
      link
      fedilink
      English
      arrow-up
      23
      arrow-down
      2
      ·
      1 day ago

      An HTTP request is a request. Servers are free to rate limit or deny access

      • taladar@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        14
        ·
        1 day ago

        Rate limiting in itself requires resources that are not always available. For one thing you can only rate limit individuals you can identify so you need to keep data about past requests in memory and attach counters to them and even then that won’t help if the requests come from IPs that are easily changed.

      • FaceDeer@fedia.io
        link
        fedilink
        arrow-up
        19
        ·
        1 day ago

        And Wikimedia, in particular, is all about publishing data under open licenses. They want the data to be downloaded and used by others. That’s what it’s for.

        • LostXOR@fedia.io
          link
          fedilink
          arrow-up
          6
          ·
          23 hours ago

          Even so I think it would be totally reasonable for them to block web scrapers, as they provide better ways to download all their data.

          • FaceDeer@fedia.io
            link
            fedilink
            arrow-up
            9
            ·
            23 hours ago

            At the root of this comment chain is a proposal to have laws passed about this.

            People can set up their web servers however they like. It’s on them to do that, it’s their web servers. I don’t think there should be legislation about whether you’re allowed to issue perfectly ordinary HTTP requests to a public server, let the server decide how to respond to them.