Offline Reinforcement Learning for LLM Multi-Step Reasoning

Article URL: https://arxiv.org/abs/2412.16145

Comments URL: https://news.ycombinator.com/item?id=42493312

Points: 11

# Comments: 5

https://arxiv.org/abs/2412.16145

Erstellt 6mo | 23.12.2024, 11:40:07

Melden Sie sich an, um einen Kommentar hinzuzufügen

Andere Beiträge in dieser Gruppe

Show HN: PRSS Site Creator – Create Blogs and Websites from Your Desktop

Show HN: PRSS Site Creator – Create Blogs and Websites from Your Desktop

Article URL: https://prss.co/

Comments URL: https://news.ycombinator.com/item?id=44391535

P

27.06.2025, 03:40:13 | Hacker news

Thomas Aquinas – The world is divine

Thomas Aquinas – The world is divine

Article URL: https://ralphammer.com/thomas-aquinas-the-world-is-divine/

Comments URL:

27.06.2025, 03:40:11 | Hacker news

A lumberjack created more than 200 sculptures in Wisconsin's Northwoods

A lumberjack created more than 200 sculptures in Wisconsin's Northwoods

Article URL: https://www.smithsonian

27.06.2025, 03:40:11 | Hacker news

Ask HN: Is anyone else just done with the industry?

Ask HN: Is anyone else just done with the industry?

I'm a self taught dev that worked my butt off and endured years of "we promote internally" lies at multiple companies to finally get paid to write code.

I've been job hunting since I was laid of

27.06.2025, 03:40:09 | Hacker news

Some thoughts on my first YC Demo Day

Some thoughts on my first YC Demo Day

Article URL: https://billchambers.me/articles/yc-demo-day-spring-25/

Comments URL:

27.06.2025, 03:40:09 | Hacker news

Save your disk, write files directly into RAM with /dev/shm

Save your disk, write files directly into RAM with /dev/shm

Article URL: https://hiandrewquinn.github.io/til-site/posts/save-your-disk-write

27.06.2025, 01:30:05 | Hacker news

The time is right for a DOM templating API

The time is right for a DOM templating API

Article URL: https://justinfagnani.com/2025/06/26/the-time-is-right-for-a-dom-templating-api/

26.06.2025, 23:10:10 | Hacker news

Techie