Article URL: https://rust-gpu.github.io/blog/2025/04/10/shadertoys/
Comments URL: https://news.ycombinator.com/item?id=43667693
Points: 25
# Comments: 5
Creato
1mo
|
13 apr 2025, 00:20:11
Accedi per aggiungere un commento
Altri post in questo gruppo

I discovered that in LLM inference, keys and values in the KV cache have very different quantization sensitivities. Keys need higher precision than values to maintain quality.
I patched llama.cp
Article URL: https://clojurescript.org/news/2025-05-16-release




Article URL: https://bobacollection.staxmuseum.org/
Comments URL: https://news.y