System Design Layers
NFR answers tell you what the system needs. Layers are what you draw. Start with the core 4, then add layers based on what the interviewer tells you.
The Core 4 — Always Draw These
Every system design diagram starts here. Add more only when NFRs demand it.
| Layer | What It Does | Example Tech |
|---|---|---|
| Client | Browser, mobile app, or external service making requests | Web, iOS, Android |
| Load Balancer | Distributes traffic across app servers. Single entry point. Handles health checks. | AWS ALB, Nginx |
| App Server | Business logic. Stateless so you can add more horizontally. | Node, Java, Go |
| Database | Persistent storage. Default to relational unless there's a clear reason not to. | PostgreSQL, MySQL |
[Client]
↓
[Load Balancer]
↓
[App Server(s)]
↓
[Database]
Optional Layers — Added by NFRs
Add only when an NFR demands it — each one solves a specific problem but adds latency, cost, and operational complexity.
| Layer | What It Does | Add When... |
|---|---|---|
| CDN | Caches static assets at edge nodes close to users. | Scale > 1M DAU, global users, low latency required |
| API Gateway | Handles auth, rate limiting, and routing before requests hit app servers. | Security requirements, multiple client types, public API |
| Cache (Redis) | Stores hot data in memory. Microsecond reads. Reduces DB load. | Read-heavy, latency < 100ms, same data read repeatedly |
| Message Queue | Decouples producers and consumers. Absorbs write bursts. | Write-heavy, async work, fault tolerance needed |
| Worker / Consumer | Processes jobs from the queue asynchronously. | Always paired with a message queue |
| DB Read Replicas | Copies of the primary DB that serve reads. Primary handles writes only. | Read-heavy, high availability |
| Object Storage | Stores files, images, video. Cheap, durable, infinitely scalable. | File uploads, media, large blobs |
NFR → Layers to Add
Translate the interviewer's requirements directly into layers. Use this to build your diagram from the answers you've gathered.
| NFR Answer | Layers to Add |
|---|---|
| High scale, read-heavy | CDN, Cache (Redis), DB Read Replicas |
| High scale, write-heavy | Message Queue, Workers |
| Low latency (< 100ms) | CDN, Cache on hot path — remove DB from read path |
| High availability | DB Read Replicas + standby, multi-AZ Load Balancer |
| Strong consistency | No cache on write path. Single DB primary. Synchronous replication. |
| Security / rate limiting | API Gateway in front of App Servers |
| File or media storage | Object Storage (S3) |
| Async / background work | Message Queue + Workers |
| Fault tolerance | Message Queue (queue buffers if downstream is down), circuit breaker at App Server |
Latency Reference — Cost Per Layer
Single source of truth for per-layer latency costs. Use the Typical column for estimates — Best Case shows the theoretical floor with everything going right.
Network is included: Each component figure (Redis 1ms, DB 5ms, etc.) covers the full round-trip including the internal network hop — App→component→App. The only network cost listed separately is the user-facing leg (Browser↔LB) because that varies by geography and is outside your control.
Scenario breakdowns below use the Typical column. Update this table to recalibrate all scenarios.
Same order, left to right (typical values):
┌─(1ms)──► Redis
Browser ─(20ms)─► CDN ─(1ms)─► LB ─(10ms)─► API GW ─(5ms)─► App ─┼─(5ms)──► DB
├─(3ms)──► Kafka
└─(50ms)─► S3
| Layer | Best Case | Typical | Notes |
|---|---|---|---|
| Network: user → CDN edge | 5ms | 20ms | Depends on user geography |
| Network: user → origin (no CDN) | 20ms | 60ms | Cross-region can be 150ms+ |
| CDN cache hit | < 1ms | 1ms | After network cost above |
| Load Balancer | < 1ms | 1ms | Pure routing overhead |
| API Gateway (auth + routing) | 3ms | 10ms | Token validation adds most of the cost |
| App Server (simple logic) | 1ms | 5ms | Complex logic or external calls add more |
| Cache / Redis hit | 0.5ms | 1ms | In-memory, same datacenter |
| DB read (indexed query) | 1ms | 5ms | Unindexed or joins: 10–100ms |
| DB write + sync replication | 5ms | 15ms | Waiting for standby to confirm |
| Message Queue publish (Kafka) | 1ms | 3ms | Async — user does NOT wait for consumer |
| Object Storage / S3 read | 10ms | 50ms | First byte. Much slower than DB or cache. |
Scenario 1: Read-Heavy
Design Twitter feed, Reddit homepage, YouTube video page.
NFRs that triggered this: high scale, read-heavy, low latency
[Client]
↓
[CDN] ──────────────────→ (cache hit: return static assets / cached response)
↓ (miss)
[Load Balancer]
↓
[App Servers]
↓
[Cache (Redis)] ─────────→ (cache hit: return feed)
↓ (miss)
[DB Primary] + [DB Read Replicas] ← writes go to primary, reads go to replicas
Latency breakdown:
| Path | Hops | Total |
|---|---|---|
| CDN hit (best case) | Network to edge (20) + CDN hit (1) | ~21ms |
| Redis hit (typical) | Network to origin (60) + LB (1) + App (5) + Redis (1) | ~67ms |
| DB read (cache miss) | Network to origin (60) + LB (1) + App (5) + Redis miss (1) + DB read (5) | ~72ms |
Key decisions:
- Feed data is precomputed and stored in Redis. App servers read from cache, not DB.
- CDN handles profile images, thumbnails, static JS/CSS.
- Read replicas absorb the bulk of DB traffic.
Scenario 2: Write-Heavy
Design an analytics pipeline, logging system, IoT sensor ingestion.
NFRs that triggered this: high scale, write-heavy, fault tolerance
[Client / Producers]
↓
[Load Balancer]
↓
[App Servers] ← accept writes, validate, publish to queue
↓
[Message Queue (Kafka / SQS)] ← absorbs bursts, durable buffer
↓
[Workers / Consumers] ← process at own pace, retry on failure
↓
[Database / Data Warehouse]
Latency breakdown:
| Path | Hops | Total |
|---|---|---|
| User-visible (async write) | Network (60) + LB (1) + App (5) + Queue publish (3) | ~69ms |
| Worker processing | Happens after 200 returned — not on user's clock | — |
Key decisions:
- App servers never write directly to the DB — they publish to the queue and return 200 immediately.
- Queue decouples ingestion speed from processing speed. If workers are slow, queue grows but nothing drops.
- Workers can be scaled independently. Failed jobs stay in queue for retry.
Scenario 3: Strong Consistency Required
Design a payment system, hotel booking, airline seat reservation.
NFRs that triggered this: strong consistency, durability, security
[Client]
↓
[Load Balancer]
↓
[API Gateway] ← auth, rate limiting, fraud checks
↓
[App Servers]
↓
[DB Primary] ← all reads AND writes go here (no replicas on write path)
↓ (synchronous)
[DB Standby] ← hot standby, promoted on failure
↓
[Audit Log] ← append-only record of every transaction
Latency breakdown:
| Path | Hops | Total |
|---|---|---|
| Write (sync replication) | Network (60) + LB (1) + API GW (10) + App (10) + DB write + sync replication (15) | ~96ms |
This is the cost of strong consistency. You're paying ~30ms extra vs an async write to guarantee no data loss.
Key decisions:
- No cache on the write path — stale reads could cause double-booking or double-charging.
- DB standby is synchronous (primary waits for standby to confirm before acking write).
- API Gateway handles rate limiting to prevent abuse at the payment endpoint.
- Audit log is append-only and separate — required for compliance and debugging.
Scenario 4: Low Latency
Design autocomplete/typeahead, live leaderboard, stock price feed.
NFRs that triggered this: latency < 100ms, read-heavy, high scale
[Client]
↓
[CDN / Edge Cache] ──────→ (return if result cached at edge)
↓ (miss)
[Load Balancer]
↓
[App Servers]
↓
[Cache (Redis)] ──────────→ (precomputed results — return immediately)
↓ (cold start / miss only)
[Database]
Latency breakdown:
| Path | Hops | Total |
|---|---|---|
| CDN edge hit | Network to edge (20) + CDN hit (1) | ~21ms |
| Redis hit | Network to origin (60) + LB (1) + App (3) + Redis (1) | ~65ms |
| DB (cold start only) | Network (60) + LB (1) + App (3) + Redis miss (1) + DB read (5) | ~70ms |
At < 100ms target, even a DB miss is within budget — but only if the query is indexed. The goal is 99%+ Redis hit rate so DB is never on the hot path.
Key decisions:
- DB is off the hot path entirely. Cache must have near-100% hit rate for common queries.
- Results are precomputed and pushed into Redis (e.g., top 10 autocomplete results per prefix).
- CDN caches at the edge for global users — reduces round-trip time before even hitting your servers.
- Any update to results is pushed into Redis asynchronously, not on the request path.
Scenario 5: Balanced / General Purpose
Design Uber, Airbnb, a general marketplace.
NFRs that triggered this: moderate scale, availability, latency, some consistency
[Client (Web + Mobile)]
↓
[CDN] ← static assets only
↓
[Load Balancer]
↓
[API Gateway] ← auth, rate limiting
↓
[App Servers]
├──→ [Cache (Redis)] ← hot reads (search results, listings)
├──→ [Message Queue] ← async tasks (email, notifications, billing)
│ ↓
│ [Workers]
↓
[DB Primary] + [DB Read Replicas]
↓
[Object Storage (S3)] ← user photos, listing images
Latency breakdown:
| Path | Hops | Total |
|---|---|---|
| Cached read (listing page) | Network (60) + LB (1) + API GW (10) + App (5) + Redis (1) | ~77ms |
| DB read (cache miss) | Network (60) + LB (1) + API GW (10) + App (5) + DB read (5) | ~81ms |
| Write (booking) | Network (60) + LB (1) + API GW (10) + App (5) + DB write (5) | ~81ms |
| Async (notification) | Returns after queue publish (3) — worker runs offline | ~79ms |
Key decisions:
- Most features read from cache, write to DB. A few critical paths (booking, payment) skip cache and write directly to primary.
- Message queue handles notifications, confirmation emails, and analytics events — nothing that should block the user's request.
- Object storage for all media. App servers only store the URL reference in the DB.
Quick Cheat Sheet
One-line trigger for every optional layer — use this at the end of requirements gathering to sanity-check your diagram.
Always: Client → Load Balancer → App Server → Database
Add CDN when: global users, latency matters, lots of static assets
Add Cache when: read-heavy, same data read repeatedly, latency < 100ms
Add Queue when: write-heavy, async work, need to absorb bursts
Add Workers when: you added a queue
Add API GW when: auth + rate limiting needed at entry point
Add Replicas when: read-heavy or high availability required
Add Object when: files, images, video, blobs
Storage