• ricecake@sh.itjust.works
    link
    fedilink
    arrow-up
    5
    arrow-down
    1
    ·
    7 months ago

    In the eyes of the law, intent does matter, as well as how it’s responded to.
    For csam material, you have to knowingly possess it or have sought to possess it.

    The AI companies use a project that indexes everything on the Internet, like Google, but with publicly available free output.

    https://commoncrawl.org/

    They use this data via another project, https://laion.ai/ , which uses the data to find images with descriptions attached, do some tricks to validate that the descriptions make sense, and then publish a list of “location of the image, description of the image” pairs.

    The AI companies use that list to grab the images train an AI on them in conjunction with the description.

    So, people at Stanford were doing research on the laion dataset when they found the instances of csam. The laion project pulled their datasets from being available while things were checked and new safeguards put in place.
    The AI companies also pulled their models (if public) while the images were removed from the data set and new safeguards implemented.
    Most of the csam images in the dataset were already gone by the time the AI companies would have attempted to access them, but some were not.

    A very obvious lack of intent to acquire the material, in fact a lack of awareness the material was possessed at all, transparency in response, taking steps to prevent further distribution, and taking action to prevent it from happening again both provides a defensive against accusations, and will make anyone interested less likely to want to make those accusations.

    On the other hand, the people who generated the images were knowingly doing so, which is a nono.

    • DarkCloud@lemmy.world
      link
      fedilink
      arrow-up
      2
      arrow-down
      1
      ·
      7 months ago

      They wouldn’t be able to generate it had there been none in the training data, so I assume the labelling and verification systems you talk about aren’t very good.

      • ricecake@sh.itjust.works
        link
        fedilink
        arrow-up
        4
        arrow-down
        3
        ·
        7 months ago

        That’s not accurate. The systems are designed to generate previously unseen concepts or images by combining known concepts.

        It’s why it can give you an image of a pony using a hangglider, despite never having seen that. It knows what ponies look like, and it knows what hanggliding looks like, so it can find a way to put both into the image. Where it doesn’t know, it will make stuff up from what it does know, often requiring potentially very detailed user explanation to describe how a horse would fit in a hangglider, or that it shouldn’t have a little person sticking out of it’s back.

        • DarkCloud@lemmy.world
          link
          fedilink
          arrow-up
          1
          arrow-down
          1
          ·
          7 months ago

          I think it would just create adults naked with children’s faces unless it actually had CSAM… Which it probably does have.

          • ricecake@sh.itjust.works
            link
            fedilink
            arrow-up
            2
            arrow-down
            1
            ·
            edit-2
            7 months ago

            Again, that’s not how it works.

            Could you hypothetically describe csam without describing an adult with a child’s head, or specifying that it’s a naked child?
            That’s what a person trying to generate csam would need to do, because it doesn’t have those concepts.
            If you just asked it directly, like I said “horse flying a hangglider” before, you would get what you describe because it’s using the only “naked” it knows.
            You would need to specifically ask it to demphasize adult characteristics and emphasize child characteristics.

            That doesn’t mean that it was trained on that content.

            For context from the article:

            The DOJ alleged that evidence from his laptop showed that Anderegg “used extremely specific and explicit prompts to create these images,” including “specific ‘negative’ prompts—that is, prompts that direct the GenAI model on what not to include in generated content—to avoid creating images that depict adults.”

              • ricecake@sh.itjust.works
                link
                fedilink
                arrow-up
                2
                arrow-down
                1
                ·
                7 months ago

                ??? Knowing how stuff works is creepy now? Knowing what the law actually is is creepy?

                I think you’re just militantly attached to your own ignorant conception of how the technology works.

                  • ricecake@sh.itjust.works
                    link
                    fedilink
                    arrow-up
                    1
                    arrow-down
                    1
                    ·
                    7 months ago

                    You made an incorrect statement about how the technology worked and I corrected you. You doubled down and I made a more detailed explanation.
                    You called me a “creep” for this, and again just now call me a “little unpaid footman”.

                    If anything’s bullshit it’s your making it aggressive when it doesn’t need to be.

                    I never said their system was perfect, or that they made no mistakes. I said the system does not need csam to generate csam. I explained why their actions weren’t illegal.

                    You need to work on your reading comprehension if you can’t see how those are different from being a bootlicker.