It’s getting a bit ridiculous out here. I’m using DuckDuckGo but since it aggregates its search from other sources, it’s also gotten bad recently. Is there a search out there that blocks domains that spam AI? Extra points if there’s something like Ublock Origin that filters things based on a community-made list.

Edit: I’m aware of Kagi but it’s pretty expensive and I’m not a fan that they, too, host their own AI tools.

  • FourPacketsOfPeanuts@lemmy.world
    link
    fedilink
    English
    arrow-up
    6
    arrow-down
    2
    ·
    17 days ago

    It can survive well where there’s editorial control. I’d talk to an AI if it had only read encyclopedias for example…

    • rumba@lemmy.zip
      link
      fedilink
      English
      arrow-up
      7
      ·
      17 days ago

      I tried doing some of this. I trained on a corpus of data I wanted it to read, with such a small amount of training data, I found it was overall too lossy. If I asked it a question about something that was in there and it responded there was a really good chance that it was in there. But there was a lot of not knowing something that was definitely in there. It wasn’t completely useless but I wouldn’t say that it was at the level of being truly helpful.

      I worry that there’s not enough verified data out there to set up for proper training.

      • FourPacketsOfPeanuts@lemmy.world
        link
        fedilink
        English
        arrow-up
        5
        arrow-down
        1
        ·
        17 days ago

        I suspect such a model would have to be far more attuned to its data being smaller but trustworthy. Something like chatGPT for example requires a huge volume because it’s weakly affected by any particular datum going in. It’s designed to adapt to general conversation norms, rather than specific facts. If you could take a generalist like chatGPT and combine it with an expert model that’s been told everything it’s told has a huge weighting then that would probably be a big step forward.