AI bots everywhere. Does anyone have a good whitelist for robots.txt?

My niche little site, http://golfcourse.wiki seems to be very popular with AI bots. They basically become most of my traffic. Most of them follow robots.txt, and that's nice and all, but they are costing me non-trivial amounts of money.

I don't want to block most search engines. I don't want to block legitimate institutions like archive.org. Is there a whitelist that I could crib instead of pretty much having to update my robots file every damn day?


Comments URL:

6mo | Hacker news
Show HN: Meelo, self-hosted music server for collectors and music maniacs

I've been working on this alternative for Plex for almost 3 years now. It's main selling point is that it correctly handles multiple versions of albums and songs. As of today, it only has a web client.

It tries to be as flexible as possible, but still requires a bit of configuration (including regexes, but if metadata is embedded into the files, it can be skipped).

I just released v3.0, making videos first-class data, and scanning + metadata matching faster.


Comments URL:

6mo | Hacker news

Ricerca