Offline Reinforcement Learning for LLM Multi-Step Reasoning

Article URL: https://arxiv.org/abs/2412.16145

Comments URL: https://news.ycombinator.com/item?id=42493312

Points: 11

# Comments: 5

https://arxiv.org/abs/2412.16145

Creato 7mo | 23 dic 2024, 11:40:07

Accedi per aggiungere un commento

Altri post in questo gruppo

Digitising CDs (a.k.a. using your phone as an image scanner)

Digitising CDs (a.k.a. using your phone as an image scanner)

Article URL: https://www.hadess.net/2025/07/digitising-cds-aka-using-your-phone-as.html

Comments URL

28 lug 2025, 08:30:14 | Hacker news

SIMD Within a Register: How I Doubled Hash Table Lookup Performance

SIMD Within a Register: How I Doubled Hash Table Lookup Performance

Article URL: https://maltsev.space/blog/012-simd-within-a-register-how-i-doubled-hash-ta

28 lug 2025, 08:30:14 | Hacker news

LLM Embeddings Explained: A Visual and Intuitive Guide

LLM Embeddings Explained: A Visual and Intuitive Guide

Article URL: https://huggingface.co/spaces/hesamation/primer-llm-embedding

Comments URL:

28 lug 2025, 08:30:13 | Hacker news

The ultimate meeting culture

The ultimate meeting culture

Article URL: https://abitmighty.com/posts/the-ultimate-meeting-culture

Comments URL:

28 lug 2025, 08:30:13 | Hacker news

How to Make Websites That Will Require Lots of Your Time and Energy

How to Make Websites That Will Require Lots of Your Time and Energy

Article URL: https://blog.jim-nielsen.com/2025/how-to-make-websites-that-require-lots-of-time

28 lug 2025, 08:30:12 | Hacker news

Hello Sprout

Article URL: https://daniel.haxx.se/blog/2025/07/28/hello-sprout/

Comments URL:

28 lug 2025, 08:30:12 | Hacker news

Self-host is just waiting for its iPhone moment

Self-host is just waiting for its iPhone moment

Article URL: https://www.robertmao.com/blog/en/self-hosting-isnt-dead-its-just-waiting-for

28 lug 2025, 06:20:07 | Hacker news

Techie