Offline Reinforcement Learning for LLM Multi-Step Reasoning

Article URL: https://arxiv.org/abs/2412.16145

Comments URL: https://news.ycombinator.com/item?id=42493312

Points: 11

# Comments: 5

https://arxiv.org/abs/2412.16145

Created 6mo | Dec 23, 2024, 11:40:07 AM

Other posts in this group

Collections: Nitpicking Gladiator's Iconic Opening Battle, Part I

Article URL: https://acoup.blog/2025/06/06/collections-nitpicking-gladiators-iconic-opening-

Jun 27, 2025, 6:10:08 AM | Hacker news

Judge rejects Meta's claim that torrenting is "irrelevant" in AI copyright case

Article URL: https://arstechnica.com/tech-policy/2025/06/judge-r

Jun 27, 2025, 6:10:07 AM | Hacker news

Biomolecular shifts occur in our 40s and 60s (2024)

Article URL: https://med.stanford.edu/news/all-news/2024/08/massive-b

Jun 27, 2025, 6:10:05 AM | Hacker news

Show HN: PRSS Site Creator – Create Blogs and Websites from Your Desktop

Article URL: https://prss.co/

Comments URL: https://news.ycombinator.com/item?id=44391535

Jun 27, 2025, 3:40:13 AM | Hacker news

Thomas Aquinas – The world is divine

Article URL: https://ralphammer.com/thomas-aquinas-the-world-is-divine/

Comments URL:

Jun 27, 2025, 3:40:11 AM | Hacker news

A lumberjack created more than 200 sculptures in Wisconsin's Northwoods

Article URL: https://www.smithsonian

Jun 27, 2025, 3:40:11 AM | Hacker news

Ask HN: Is anyone else just done with the industry?

I'm a self taught dev that worked my butt off and endured years of "we promote internally" lies at multiple companies to finally get paid to write code.

I've been job hunting since I was laid of

Jun 27, 2025, 3:40:09 AM | Hacker news

Techie