Article URL: https://labs.quansight.org/blog/free-threaded-one-year-recap
Comments URL: https://news.ycombinator.com/item?id=44003445
Points: 58
# Comments: 35
https://labs.quansight.org/blog/free-threaded-one-year-recap
Vytvořeno
10h
|
16. 5. 2025 12:40:19
Chcete-li přidat komentář, přihlaste se
Ostatní příspěvky v této skupině

I discovered that in LLM inference, keys and values in the KV cache have very different quantization sensitivities. Keys need higher precision than values to maintain quality.
I patched llama.cp
Article URL: https://clojurescript.org/news/2025-05-16-release




Article URL: https://bobacollection.staxmuseum.org/
Comments URL: https://news.y