Supervised Fine Tuning on Curated Data is Reinforcement Learning

Article URL: https://arxiv.org/abs/2507.12856

Comments URL: https://news.ycombinator.com/item?id=44727788

Points: 13

# Comments: 4

https://arxiv.org/abs/2507.12856

созданный 4d | 29 июл. 2025 г., 21:40:10

Войдите, чтобы добавить комментарий

Другие сообщения в этой группе

The First Widespread Cure for HIV Could Be in Children

The First Widespread Cure for HIV Could Be in Children

Article URL: https://www.wired.com/story/the-first-widespread-cure-for-hiv-could-be-in-children/

2 авг. 2025 г., 11:30:08 | Hacker news

Tesla Found Partly Liable in 2019 Autopilot Death

Tesla Found Partly Liable in 2019 Autopilot Death

Article URL: https://www.wired.com/story/tesla-liable-2019-autopilot-crash-death/

Comments URL:

2 авг. 2025 г., 11:30:07 | Hacker news

Terence Tao weighs in on the suspension of UCLA grants

Terence Tao weighs in on the suspension of UCLA grants

Article URL: https://mathstodon.xyz/@tao/114956840959338146

Comments URL:

2 авг. 2025 г., 09:10:25 | Hacker news

Ladybird Browser July Update

Ladybird Browser July Update

Article URL: https://ladybird.org/newsletter/2025-07-31/

Comments URL: http

2 авг. 2025 г., 09:10:24 | Hacker news

Microsoft is open sourcing Windows 11's UI framework

Microsoft is open sourcing Windows 11's UI framework

Article URL: https://www.neowin.net/news/microsoft-is-taking-steps-to-open-sou

2 авг. 2025 г., 09:10:22 | Hacker news

At $250M, top AI salaries dwarf the Manhattan Project and the Space Race

At $250M, top AI salaries dwarf the Manhattan Project and the Space Race

Article URL: https://arstechnica.com/ai/2025/08/at-250-million-

2 авг. 2025 г., 06:50:05 | Hacker news

Native Sparse Attention

Native Sparse Attention

Was submitted as "DeepSeek won the best paper award at ACL 2025"

Here is the awards page:

2 авг. 2025 г., 04:30:13 | Hacker news

Techie