DeepSeek's multi-head latent attention and other KV cache tricks

Article URL: https://www.pyspur.dev/blog/multi-head-latent-attention-kv-cache-paper-list

Comments URL: https://news.ycombinator.com/item?id=42858741

Points: 109

# Comments: 10

https://www.pyspur.dev/blog/multi-head-latent-attention-kv-cache-paper-list

Created 6mo | Jan 29, 2025, 12:20:21 AM

Login to add comment

Other posts in this group

The First Widespread Cure for HIV Could Be in Children

The First Widespread Cure for HIV Could Be in Children

Article URL: https://www.wired.com/story/the-first-widespread-cure-for-hiv-could-be-in-children/

Aug 2, 2025, 11:30:08 AM | Hacker news

Tesla Found Partly Liable in 2019 Autopilot Death

Tesla Found Partly Liable in 2019 Autopilot Death

Article URL: https://www.wired.com/story/tesla-liable-2019-autopilot-crash-death/

Comments URL:

Aug 2, 2025, 11:30:07 AM | Hacker news

Terence Tao weighs in on the suspension of UCLA grants

Terence Tao weighs in on the suspension of UCLA grants

Article URL: https://mathstodon.xyz/@tao/114956840959338146

Comments URL:

Aug 2, 2025, 9:10:25 AM | Hacker news

Ladybird Browser July Update

Ladybird Browser July Update

Article URL: https://ladybird.org/newsletter/2025-07-31/

Comments URL: http

Aug 2, 2025, 9:10:24 AM | Hacker news

Microsoft is open sourcing Windows 11's UI framework

Microsoft is open sourcing Windows 11's UI framework

Article URL: https://www.neowin.net/news/microsoft-is-taking-steps-to-open-sou

Aug 2, 2025, 9:10:22 AM | Hacker news

At $250M, top AI salaries dwarf the Manhattan Project and the Space Race

At $250M, top AI salaries dwarf the Manhattan Project and the Space Race

Article URL: https://arstechnica.com/ai/2025/08/at-250-million-

Aug 2, 2025, 6:50:05 AM | Hacker news

Native Sparse Attention

Native Sparse Attention

Was submitted as "DeepSeek won the best paper award at ACL 2025"

Here is the awards page:

Aug 2, 2025, 4:30:13 AM | Hacker news

Techie