Scalability is often discussed but rarely understood until it breaks. Designing a system that can handle 10 million concurrent users is fundamentally different from designing one for 10 thousand. It is not just about "adding more servers"; it is about removing bottlenecks.
The Database Bottleneck
In almost every high-scale system, the database is the first point of failure. Vertical scaling (buying a bigger server) has a hard limit. Horizontal scaling (sharding) introduces significant complexity in application logic.
When you split your user base across 100 database shards, you lose the ability to perform joins across shards. Your application logic must now handle mapping users to shards and aggregating data.
Caching Strategies
The fastest query is the one you never make. Caching is the primary defense against read-heavy traffic. However, "There are only two hard things in Computer Science: cache invalidation and naming things."
Write-Through vs. Write-Back
- Write-Through: Safe, consistent, but slower writes.
- Write-Back: Fast writes, but risk of data loss on crash.
- Cache-Aside: The application manages the cache. Most common, but complex to implement correctly without race conditions.
Accepting Eventual Consistency
To achieve global scale, we often have to trade strict consistency for availability (CAP Theorem). If a user in Tokyo updates their profile, it is acceptable if a user in New York sees the old profile for 2 seconds.
Understanding where "eventual consistency" is acceptable (e.g., social feed, like counts) and where it is not (e.g., payment balance, inventory count) is the hallmark of a senior architect. We often use CRDTs (Conflict-free Replicated Data Types) to handle collaborative state at the edge.
Conclusion
Scale exposes strictness. Sloppy code that runs fine at 100 QPS will bring a cluster to its knees at 100k QPS. Performance engineering is not an optimization phase at the end; it is a design constraint at the beginning.
