
Article URL: https://www.ft.com/content/e456ea34-c6ad-43fe-abe9-d4ce781c07b4
Comments URL: https://news.ycombinator.com/item?id=44009999
Points: 162
# Comments: 90
https://www.ft.com/content/e456ea34-c6ad-43fe-abe9-d4ce781c07b4


Article URL: https://cacm.acm.org/news/the-collapse-of-gpt/
Comments URL: https://news.ycombinator.com/item?id=44010705
Points: 18
# Comments: 7

I discovered that in LLM inference, keys and values in the KV cache have very different quantization sensitivities. Keys need higher precision than values to maintain quality.
I patched llama.cpp to enable different bit-widths for keys vs. values on Apple Silicon. The results are surprising:
- K8V4 (8-bit keys, 4-bit values): 59% memory reduction with only 0.86% perplexity loss - K4V8 (4-bit keys, 8-bit values): 59% memory reduction but 6.06% perplexity loss - The configurations use the
Article URL: https://clojurescript.org/news/2025-05-16-release
Comments URL: https://news.ycombinator.com/item?id=44009464
Points: 32
# Comments: 2

Article URL: https://aeon.co/essays/who-needs-ai-text-generation-when-theres-erasmus-of-rotterdam
Comments URL: https://news.ycombinator.com/item?id=44009621
Points: 14
# Comments: 1
https://aeon.co/essays/who-needs-ai-text-generation-when-theres-erasmus-of-rotterdam

Article URL: https://cloud.google.com/blog/products/databases/techniques-for-improving-text-to-sql
Comments URL: https://news.ycombinator.com/item?id=44009848
Points: 10
# Comments: 1
https://cloud.google.com/blog/products/databases/techniques-for-improving-text-to-sql


Article URL: https://bobacollection.staxmuseum.org/
Comments URL: https://news.ycombinator.com/item?id=44007854
Points: 15
# Comments: 4

Article URL: https://www.lrb.co.uk/the-paper/v47/n09/david-thomson/cool-tricking
Comments URL: https://news.ycombinator.com/item?id=44008412
Points: 16
# Comments: 8
https://www.lrb.co.uk/the-paper/v47/n09/david-thomson/cool-tricking