Sam said yesterday that chatgpt handles ~700M weekly users. Meanwhile, I can't even run a single GPT-4-class model locally without insane VRAM or painfully slow speeds.
Sure, they have huge GPU clusters, but there must be more going on - model optimizations, sharding, custom hardware, clever load balancing, etc.
What engineering tricks make this possible at such massive scale while keeping latency low?
Curious to hear insights from people who've built large-scale ML systems.
Comments URL: https://news.ycombinator.com/item?id=44840728
Points: 51
# Comments: 36
Zaloguj się, aby dodać komentarz
Inne posty w tej grupie
Article URL: https://lwn.net/Articles/1032612/
Comments URL: https://news.ycombinator


Article URL: https://arstechnica.com/tech-policy/2025/08/devel

Article URL: https://english.kyodonews.net/articles/-/59582

(spoiler: its XSLT)
I've been working on a little demo for how to avoid copy-pasting header/footer boilerplate on a simple static webpage. My goal is to approximate the experience of Jekyll/Hugo