Companies that build large LLMs have already said that this is becoming a problem. They’re running out of high-quality human-written content to train their models.
Google paid Reddit to get access to their data to train their models, which is probably why their AI can be a bit dumb at times (and of course, the users that actually contributed the content don’t get any of that money)
that’s true, but I think it’s in the phrasing, they describe it as a shortage of human made content. the bigger issue to note is the lack of ability to identify human made content. IE you give it reddit and our e-mails, there’s plenty of human made content on there… but nobody knows what percentage of it is actually bots or AIs.
Companies that build large LLMs have already said that this is becoming a problem. They’re running out of high-quality human-written content to train their models.
Google paid Reddit to get access to their data to train their models, which is probably why their AI can be a bit dumb at times (and of course, the users that actually contributed the content don’t get any of that money)
https://en.wikipedia.org/wiki/Model_collapse
that’s true, but I think it’s in the phrasing, they describe it as a shortage of human made content. the bigger issue to note is the lack of ability to identify human made content. IE you give it reddit and our e-mails, there’s plenty of human made content on there… but nobody knows what percentage of it is actually bots or AIs.
@TheFogan @dan ironically for them, Google have been pushing Gmail’s predictive text feature for ages already.