Ask HN: How can ChatGPT serve 700M users when I can't run one GPT-4 locally?

Sam said yesterday that chatgpt handles ~700M weekly users. Meanwhile, I can't even run a single GPT-4-class model locally without insane VRAM or painfully slow speeds.

Sure, they have huge GPU clusters, but there must be more going on - model optimizations, sharding, custom hardware, clever load balancing, etc.

What engineering tricks make this possible at such massive scale while keeping latency low?

Curious to hear insights from people who've built large-scale ML systems.

Comments URL: https://news.ycombinator.com/item?id=44840728

Points: 51

# Comments: 36

https://news.ycombinator.com/item?id=44840728

Utworzony 14d | 8 sie 2025, 20:30:11

Zaloguj się, aby dodać komentarz

Inne posty w tej grupie

The use of LLM assistants for kernel development

Article URL: https://lwn.net/Articles/1032612/

Comments URL: https://news.ycombinator

23 sie 2025, 05:50:14 | Hacker news

Google says it dropped the energy cost of AI queries by 33x in one year

Article URL: https://arstechnica.com/ai/2025/08/google-says-it-dropped-the-ene

23 sie 2025, 05:50:12 | Hacker news

A visual history of Visual C++ (2017)

Article URL: http://www.malsmith.net/blog/visual-c-visual-history/

Comments URL:

23 sie 2025, 05:50:09 | Hacker news

Developer sentenced to prison for activating “kill switch” to avenge his firing

Article URL: https://arstechnica.com/tech-policy/2025/08/devel

23 sie 2025, 05:50:07 | Hacker news

My tips for using LLM agents to create software

Article URL: https://efitz-thoughts.blogspot.com/2025/08/my-experience-creating-software-with_22.

23 sie 2025, 03:30:33 | Hacker news

Japan city drafts ordinance to cap smartphone use at 2 hours per day

Article URL: https://english.kyodonews.net/articles/-/59582

Comments URL:

23 sie 2025, 03:30:29 | Hacker news

Show HN: JavaScript-free (X)HTML Includes

(spoiler: its XSLT)

I've been working on a little demo for how to avoid copy-pasting header/footer boilerplate on a simple static webpage. My goal is to approximate the experience of Jekyll/Hugo

23 sie 2025, 01:20:25 | Hacker news

Techie