cross-posted from: https://sh.itjust.works/post/49077840

Artificial intelligence (AI) chatbots are worse at retrieving accurate information and reasoning when trained on large amounts of low-quality content, particularly if the content is popular on social media1, finds a preprint posted on arXiv on 15 October.

In data science, good-quality data need to meet certain criteria, such as being grammatically correct and understandable, says co-author Zhangyang Wang, who studies generative AI at the University of Texas at Austin. But these criteria fail to capture differences in content quality, he says.

  • Telex@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    11
    ·
    1 day ago

    Garbage in, garbage out is one of the older laws for a reason.

    Kind of funny if the model/training isn’t better at filtering out repetitive or nonsensical noise from coherent data. Doesn’t seem like it would be too hard considering how well e.g. spam filters work.