Supervised Fine Tuning on Curated Data is Reinforcement Learning

Article URL: https://arxiv.org/abs/2507.12856

Comments URL: https://news.ycombinator.com/item?id=44727788

Points: 13

# Comments: 4

https://arxiv.org/abs/2507.12856

Établi 3d | 29 juil. 2025, 21:40:10

Connectez-vous pour ajouter un commentaire

Autres messages de ce groupe

Native Sparse Attention

Native Sparse Attention

Was submitted as "DeepSeek won the best paper award at ACL 2025"

Here is the awards page:

2 août 2025, 04:30:13 | Hacker news

Hardening mode for the compiler

Hardening mode for the compiler

Article URL: https://discourse.llvm.org/t/rfc-hardening-mode-for-the-compiler/87660

Comments URL:

2 août 2025, 04:30:12 | Hacker news

Peak Energy just shipped the US's first grid-scale sodium-ion battery

Peak Energy just shipped the US's first grid-scale sodium-ion battery

Article URL: https://electrek.co/2025/07/30/peak-energy-us-first-grid-scale-sodium-ion-battery/

2 août 2025, 04:30:10 | Hacker news

Robert Wilson has died

Robert Wilson has died

https://www.nytimes.com/2025/07/31/theater/robert-wilson-dea... (

2 août 2025, 04:30:07 | Hacker news

Meta violated privacy law, jury says in menstrual data fight

Meta violated privacy law, jury says in menstrual data fight

Article URL: https://www.courthousenews.com/meta-violated-privacy-law-jury-says-in-menstrual-d

2 août 2025, 02:20:06 | Hacker news

Contrarian climate assessment from U.S. government draws pushback

Contrarian climate assessment from U.S. government draws pushback

Article URL: https://www.science.org/content/article/contrarian-climate-assessme

2 août 2025, 02:20:04 | Hacker news

The Rickover Corpus: A digital archive of Admiral Rickover's speeches and memos

The Rickover Corpus: A digital archive of Admiral Rickover's speeches and memos

Article URL: https://rickovercorpus.org/

Comments URL: https://news.ycombinator.com/item?id

2 août 2025, 02:20:03 | Hacker news

Techie