Home / System Design / System Design Layers

System Design Layers

NFR answers tell you what the system needs. Layers are what you draw. Start with the core 4, then add layers based on what the interviewer tells you.


The Core 4 — Always Draw These

Every system design diagram starts here. Add more only when NFRs demand it.

Layer What It Does Example Tech
Client Browser, mobile app, or external service making requests Web, iOS, Android
Load Balancer Distributes traffic across app servers. Single entry point. Handles health checks. AWS ALB, Nginx
App Server Business logic. Stateless so you can add more horizontally. Node, Java, Go
Database Persistent storage. Default to relational unless there's a clear reason not to. PostgreSQL, MySQL
[Client]
   ↓
[Load Balancer]
   ↓
[App Server(s)]
   ↓
[Database]

Optional Layers — Added by NFRs

Add only when an NFR demands it — each one solves a specific problem but adds latency, cost, and operational complexity.

Layer What It Does Add When...
CDN Caches static assets at edge nodes close to users. Scale > 1M DAU, global users, low latency required
API Gateway Handles auth, rate limiting, and routing before requests hit app servers. Security requirements, multiple client types, public API
Cache (Redis) Stores hot data in memory. Microsecond reads. Reduces DB load. Read-heavy, latency < 100ms, same data read repeatedly
Message Queue Decouples producers and consumers. Absorbs write bursts. Write-heavy, async work, fault tolerance needed
Worker / Consumer Processes jobs from the queue asynchronously. Always paired with a message queue
DB Read Replicas Copies of the primary DB that serve reads. Primary handles writes only. Read-heavy, high availability
Object Storage Stores files, images, video. Cheap, durable, infinitely scalable. File uploads, media, large blobs

NFR → Layers to Add

Translate the interviewer's requirements directly into layers. Use this to build your diagram from the answers you've gathered.

NFR Answer Layers to Add
High scale, read-heavy CDN, Cache (Redis), DB Read Replicas
High scale, write-heavy Message Queue, Workers
Low latency (< 100ms) CDN, Cache on hot path — remove DB from read path
High availability DB Read Replicas + standby, multi-AZ Load Balancer
Strong consistency No cache on write path. Single DB primary. Synchronous replication.
Security / rate limiting API Gateway in front of App Servers
File or media storage Object Storage (S3)
Async / background work Message Queue + Workers
Fault tolerance Message Queue (queue buffers if downstream is down), circuit breaker at App Server

Latency Reference — Cost Per Layer

Single source of truth for per-layer latency costs. Use the Typical column for estimates — Best Case shows the theoretical floor with everything going right.

Network is included: Each component figure (Redis 1ms, DB 5ms, etc.) covers the full round-trip including the internal network hop — App→component→App. The only network cost listed separately is the user-facing leg (Browser↔LB) because that varies by geography and is outside your control.

Scenario breakdowns below use the Typical column. Update this table to recalibrate all scenarios.

Same order, left to right (typical values):

                                                                    ┌─(1ms)──► Redis
Browser ─(20ms)─► CDN ─(1ms)─► LB ─(10ms)─► API GW ─(5ms)─► App ─┼─(5ms)──► DB
                                                                    ├─(3ms)──► Kafka
                                                                    └─(50ms)─► S3
Layer Best Case Typical Notes
Network: user → CDN edge 5ms 20ms Depends on user geography
Network: user → origin (no CDN) 20ms 60ms Cross-region can be 150ms+
CDN cache hit < 1ms 1ms After network cost above
Load Balancer < 1ms 1ms Pure routing overhead
API Gateway (auth + routing) 3ms 10ms Token validation adds most of the cost
App Server (simple logic) 1ms 5ms Complex logic or external calls add more
Cache / Redis hit 0.5ms 1ms In-memory, same datacenter
DB read (indexed query) 1ms 5ms Unindexed or joins: 10–100ms
DB write + sync replication 5ms 15ms Waiting for standby to confirm
Message Queue publish (Kafka) 1ms 3ms Async — user does NOT wait for consumer
Object Storage / S3 read 10ms 50ms First byte. Much slower than DB or cache.

Scenario 1: Read-Heavy

Design Twitter feed, Reddit homepage, YouTube video page.

NFRs that triggered this: high scale, read-heavy, low latency

[Client]
   ↓
[CDN] ──────────────────→ (cache hit: return static assets / cached response)
   ↓ (miss)
[Load Balancer]
   ↓
[App Servers]
   ↓
[Cache (Redis)] ─────────→ (cache hit: return feed)
   ↓ (miss)
[DB Primary] + [DB Read Replicas]  ← writes go to primary, reads go to replicas

Latency breakdown:

Path Hops Total
CDN hit (best case) Network to edge (20) + CDN hit (1) ~21ms
Redis hit (typical) Network to origin (60) + LB (1) + App (5) + Redis (1) ~67ms
DB read (cache miss) Network to origin (60) + LB (1) + App (5) + Redis miss (1) + DB read (5) ~72ms

Key decisions:

  • Feed data is precomputed and stored in Redis. App servers read from cache, not DB.
  • CDN handles profile images, thumbnails, static JS/CSS.
  • Read replicas absorb the bulk of DB traffic.

Scenario 2: Write-Heavy

Design an analytics pipeline, logging system, IoT sensor ingestion.

NFRs that triggered this: high scale, write-heavy, fault tolerance

[Client / Producers]
   ↓
[Load Balancer]
   ↓
[App Servers]  ← accept writes, validate, publish to queue
   ↓
[Message Queue (Kafka / SQS)]  ← absorbs bursts, durable buffer
   ↓
[Workers / Consumers]  ← process at own pace, retry on failure
   ↓
[Database / Data Warehouse]

Latency breakdown:

Path Hops Total
User-visible (async write) Network (60) + LB (1) + App (5) + Queue publish (3) ~69ms
Worker processing Happens after 200 returned — not on user's clock

Key decisions:

  • App servers never write directly to the DB — they publish to the queue and return 200 immediately.
  • Queue decouples ingestion speed from processing speed. If workers are slow, queue grows but nothing drops.
  • Workers can be scaled independently. Failed jobs stay in queue for retry.

Scenario 3: Strong Consistency Required

Design a payment system, hotel booking, airline seat reservation.

NFRs that triggered this: strong consistency, durability, security

[Client]
   ↓
[Load Balancer]
   ↓
[API Gateway]  ← auth, rate limiting, fraud checks
   ↓
[App Servers]
   ↓
[DB Primary]  ← all reads AND writes go here (no replicas on write path)
   ↓ (synchronous)
[DB Standby]  ← hot standby, promoted on failure
   ↓
[Audit Log]  ← append-only record of every transaction

Latency breakdown:

Path Hops Total
Write (sync replication) Network (60) + LB (1) + API GW (10) + App (10) + DB write + sync replication (15) ~96ms

This is the cost of strong consistency. You're paying ~30ms extra vs an async write to guarantee no data loss.

Key decisions:

  • No cache on the write path — stale reads could cause double-booking or double-charging.
  • DB standby is synchronous (primary waits for standby to confirm before acking write).
  • API Gateway handles rate limiting to prevent abuse at the payment endpoint.
  • Audit log is append-only and separate — required for compliance and debugging.

Scenario 4: Low Latency

Design autocomplete/typeahead, live leaderboard, stock price feed.

NFRs that triggered this: latency < 100ms, read-heavy, high scale

[Client]
   ↓
[CDN / Edge Cache] ──────→ (return if result cached at edge)
   ↓ (miss)
[Load Balancer]
   ↓
[App Servers]
   ↓
[Cache (Redis)] ──────────→ (precomputed results — return immediately)
   ↓ (cold start / miss only)
[Database]

Latency breakdown:

Path Hops Total
CDN edge hit Network to edge (20) + CDN hit (1) ~21ms
Redis hit Network to origin (60) + LB (1) + App (3) + Redis (1) ~65ms
DB (cold start only) Network (60) + LB (1) + App (3) + Redis miss (1) + DB read (5) ~70ms

At < 100ms target, even a DB miss is within budget — but only if the query is indexed. The goal is 99%+ Redis hit rate so DB is never on the hot path.

Key decisions:

  • DB is off the hot path entirely. Cache must have near-100% hit rate for common queries.
  • Results are precomputed and pushed into Redis (e.g., top 10 autocomplete results per prefix).
  • CDN caches at the edge for global users — reduces round-trip time before even hitting your servers.
  • Any update to results is pushed into Redis asynchronously, not on the request path.

Scenario 5: Balanced / General Purpose

Design Uber, Airbnb, a general marketplace.

NFRs that triggered this: moderate scale, availability, latency, some consistency

[Client (Web + Mobile)]
   ↓
[CDN]  ← static assets only
   ↓
[Load Balancer]
   ↓
[API Gateway]  ← auth, rate limiting
   ↓
[App Servers]
   ├──→ [Cache (Redis)]   ← hot reads (search results, listings)
   ├──→ [Message Queue]   ← async tasks (email, notifications, billing)
   │         ↓
   │     [Workers]
   ↓
[DB Primary] + [DB Read Replicas]
   ↓
[Object Storage (S3)]  ← user photos, listing images

Latency breakdown:

Path Hops Total
Cached read (listing page) Network (60) + LB (1) + API GW (10) + App (5) + Redis (1) ~77ms
DB read (cache miss) Network (60) + LB (1) + API GW (10) + App (5) + DB read (5) ~81ms
Write (booking) Network (60) + LB (1) + API GW (10) + App (5) + DB write (5) ~81ms
Async (notification) Returns after queue publish (3) — worker runs offline ~79ms

Key decisions:

  • Most features read from cache, write to DB. A few critical paths (booking, payment) skip cache and write directly to primary.
  • Message queue handles notifications, confirmation emails, and analytics events — nothing that should block the user's request.
  • Object storage for all media. App servers only store the URL reference in the DB.

Quick Cheat Sheet

One-line trigger for every optional layer — use this at the end of requirements gathering to sanity-check your diagram.

Always:        Client → Load Balancer → App Server → Database

Add CDN        when: global users, latency matters, lots of static assets
Add Cache      when: read-heavy, same data read repeatedly, latency < 100ms
Add Queue      when: write-heavy, async work, need to absorb bursts
Add Workers    when: you added a queue
Add API GW     when: auth + rate limiting needed at entry point
Add Replicas   when: read-heavy or high availability required
Add Object     when: files, images, video, blobs
Storage