Article URL: https://press.asimov.com/articles/artificial-wombs
Comments URL: https://news.ycombinator.com/item?id=41758516
Points: 26
# Comments: 29
Created
7mo
|
Oct 6, 2024, 9:40:26 PM
Login to add comment
Other posts in this group

I discovered that in LLM inference, keys and values in the KV cache have very different quantization sensitivities. Keys need higher precision than values to maintain quality.
I patched llama.cp
Article URL: https://clojurescript.org/news/2025-05-16-release




Article URL: https://bobacollection.staxmuseum.org/
Comments URL: https://news.y