Ask HN: How can ChatGPT serve 700M users when I can't run one GPT-4 locally?

Sam said yesterday that chatgpt handles ~700M weekly users. Meanwhile, I can't even run a single GPT-4-class model locally without insane VRAM or painfully slow speeds.

Sure, they have huge GPU clusters, but there must be more going on - model optimizations, sharding, custom hardware, clever load balancing, etc.

What engineering tricks make this possible at such massive scale while keeping latency low?

Curious to hear insights from people who've built large-scale ML systems.

Comments URL: https://news.ycombinator.com/item?id=44840728

Points: 51

# Comments: 36

https://news.ycombinator.com/item?id=44840728

Établi 4d | 8 août 2025, 20:30:11

Connectez-vous pour ajouter un commentaire

Autres messages de ce groupe

High-severity WinRAR 0-day exploited for weeks by 2 groups

Article URL: https://arstechnica.com/security/2025/08/high-severity-winrar-0-day-expl

12 août 2025, 14:50:09 | Hacker news

US influencer stranded in Antarctica after landing plane without permission

Article URL: https://www.independent.co.uk/travel/news-and-advice/antarctica-e

12 août 2025, 14:50:09 | Hacker news

Australian court finds Apple, Google guilty of being anticompetitive

Article URL: https://www.ghacks.net/2025/08/12/australian-court-finds-apple-google-

12 août 2025, 14:50:08 | Hacker news

That viral video of a 'deactivated' Tesla Cybertruck is a fake

Article URL: https://www.theverge.com/tesla/757594/tesla-cybertruck-deactivated-viral-video-fake

12 août 2025, 14:50:08 | Hacker news

Show HN: Move to dodge the bullets. How long can you survive?

Article URL: https://dodge.trickle.host

Comments URL: https://news.ycombinator.com/item?id=4

12 août 2025, 12:40:07 | Hacker news

Qodo CLI agent scores 71.2% on SWE-bench Verified

Article URL: https://www.qodo.ai/blog/qodo-command-swe-bench-verified/

Comments URL:

12 août 2025, 12:40:07 | Hacker news

Radicle 1.3.0

Article URL: https://radicle.xyz/2025/08/12/radicle-1.3.0

Comments URL: ht

12 août 2025, 12:40:06 | Hacker news

Techie