While we can, and if it isn't too late already. The web is overrun with AI generated drivel, I've been searching for information on some widely varying subjects and I keep landing in recently auto-generated junk. Unfortunately most search engines associate 'recency' with 'quality' or 'relevance' and that is very much no longer true.
While there is still a chance I think we should snapshot a version of the web and make it publicly available. That can serve as something to calibrate various information sources against to get an idea of whether or not they are to be used or rather not. I'm pretty sure Google, OpenAI and Facebook all have such snapshots stashed away that they train their AIs on, and such data will rapidly become as precious as 'low background steel'.
https://en.wikipedia.org/wiki/Low-background_steel
Comments URL: https://news.ycombinator.com/item?id=40058399
Points: 46
# Comments: 29
Login to add comment
Other posts in this group
Article URL: https://typespec.io/blog/2024-04-25-introducing
Article URL: https://sqlite.org/draft/whybytecode.html
Comments URL: https://
Article URL: https://rentry.co/GPT2
Comments URL: https://news.ycombinator.com/item?id=40199715