One Primary, Dozens of Replicas: OpenAI’s PostgreSQL scaling for 800M users
OpenAI reportedly scales PostgreSQL to serve hundreds of millions of users by using a single write master with many read replicas, while offloading some write-heavy workloads to Azure Cosmos DB to relieve the primary. The approach favors a simpler, single-primary architecture but carries trade-offs such as potential single-point-of-failure risk, MVCC write amplification, and challenges around schema changes and cross‑shard operations. Commenters debate replication lag, WAL shipping, cost of large cloud instances, and when to shard or migrate workloads to non‑Postgres systems. Overall, the piece argues that with the right hardware and careful design, PostgreSQL can scale far beyond what many assume, though it isn’t a universal recipe and comes with notable complexity.

