Tell HN: We should snapshot a mostly AI output free version of the web

While we can, and if it isn't too late already. The web is overrun with AI generated drivel, I've been searching for information on some widely varying subjects and I keep landing in recently auto-generated junk. Unfortunately most search engines associate 'recency' with 'quality' or 'relevance' and that is very much no longer true.

While there is still a chance I think we should snapshot a version of the web and make it publicly available. That can serve as something to calibrate various information sources against to get an idea of whether or not they are to be used or rather not. I'm pretty sure Google, OpenAI and Facebook all have such snapshots stashed away that they train their AIs on, and such data will rapidly become as precious as 'low background steel'.

https://en.wikipedia.org/wiki/Low-background_steel


Comments URL: https://news.ycombinator.com/item?id=40058399

Points: 46

# Comments: 29

https://news.ycombinator.com/item?id=40058399

Created 13d | Apr 17, 2024, 12:50:14 AM


Login to add comment