Ask HN: How can ChatGPT serve 700M users when I can't run one GPT-4 locally?

Sam said yesterday that chatgpt handles ~700M weekly users. Meanwhile, I can't even run a single GPT-4-class model locally without insane VRAM or painfully slow speeds.

Sure, they have huge GPU clusters, but there must be more going on - model optimizations, sharding, custom hardware, clever load balancing, etc.

What engineering tricks make this possible at such massive scale while keeping latency low?

Curious to hear insights from people who've built large-scale ML systems.


Comments URL: https://news.ycombinator.com/item?id=44840728

Points: 51

# Comments: 36

https://news.ycombinator.com/item?id=44840728

Creată 12d | 8 aug. 2025, 20:30:11


Autentifică-te pentru a adăuga comentarii

Alte posturi din acest grup

Show HN: Hanaco Weather – A poetic weather SNS from the OS Yamato project

hanaco Weather is a minimalist social network where users post short, emotional weather thoughts and connect with others experiencing the same weather.

No likes or followers (for now) — just gen

20 aug. 2025, 08:20:15 | Hacker news
Type-machine
20 aug. 2025, 08:20:13 | Hacker news