Ask HN: How can ChatGPT serve 700M users when I can't run one GPT-4 locally?

Sam said yesterday that chatgpt handles ~700M weekly users. Meanwhile, I can't even run a single GPT-4-class model locally without insane VRAM or painfully slow speeds.

Sure, they have huge GPU clusters, but there must be more going on - model optimizations, sharding, custom hardware, clever load balancing, etc.

What engineering tricks make this possible at such massive scale while keeping latency low?

Curious to hear insights from people who've built large-scale ML systems.


<

Show HN: Trayce – “Burp Suite for developers”

About a year ago I introduced Trayce to HN as the "network tab for docker containers". Now I have released a new version which adds an HTTP client. The idea is to combine network monitoring with an HTTP client to help developers interact with and debug web application servers.

Think "Burp Suite for developers".

Trayce stores requests as local files using the .bru file format. The UI is based on Flutter which means it offers a super-fast and modern desktop GUI with a total download size o


Buscar