Non-Functional Requirements

Don't draw boxes until you know what the system demands. For each NFR this doc covers what it means, how the answer changes your architecture layer by layer, key terms, and which real systems make it their top priority. Pick the ones most relevant to the system and let them drive your design.

Scale

How big is the system, and where does the load actually hit? Scale affects every layer — not just the database.

Ask:

How many daily active users?
What's the read/write ratio?
Any bursty traffic patterns (holidays, events)?

DAU → QPS: Use 100,000 seconds/day for easy mental math. QPS = DAU × requests_per_user_per_day ÷ 100,000. Peak = avg × 2–3×. See Capacity Estimation for worked examples and per-technology limits.

DAU	QPS (est.)	Key Architectural Decisions
10K	~1	Single server. No LB, replicas, or cache needed.
100K	~10	Add LB for redundancy (not load). CDN for static assets.
1M	~100	Multiple app servers. DB read replicas (1–2). Connection pooler.
10M	~1,000	Kafka for async writes. Redis Cluster. Read replicas sufficient — don't shard yet.
100M+	~10,000+	Multi-region. DB sharding or distributed SQL. CDN absorbs 80%+ of traffic.

Read/Write ratio shapes your architecture:

Read-heavy (100:1) → cache aggressively (Redis, CDN), DB read replicas. Twitter feed, Reddit homepage.
Write-heavy (1:10) → message queues (Kafka) to absorb bursts, append-only logs, async consumers. Consider CQRS (separate write model from read model) to prevent reads from competing with writes. Logging pipeline, analytics ingestion.
Balanced → general-purpose horizontal scaling.

Stateless vs stateful scaling: Stateless services (REST APIs, GraphQL) scale horizontally by adding nodes — any instance can handle any request. Stateful services (WebSocket servers, in-memory session stores) can't: a client connected to Server A can't transparently be routed to Server B. Fix with either session affinity (sticky routing via the LB) or by pushing state to an external store (Redis) so any app server can serve the session.

Burst traffic: If traffic spikes at predictable times (Black Friday, live events), design for auto-scaling and queue-based buffering, not steady-state peak capacity.

Storage estimate: DAU × avg_event_size × events_per_day × retention_days

Most critical for: Twitter/X (read-heavy feed), YouTube (video storage + CDN), Uber (surge traffic), ticketing systems (flash sales).

Latency

How fast must the system respond? This determines where you place compute, what stays synchronous, and what you offload.

Ask:

What's the acceptable p99 response time?
Are there specific operations that must be fast?

Target	What It Means	Design Impact
< 10ms	Ultra-low. Real-time systems.	Data must live in-process memory. No network hops. Compute co-located with data.
< 100ms	Feels instant to users.	Read from Redis (≈1ms), not DB (≈10ms). CDN serves assets from edge, not origin. Precompute results offline.
< 500ms	Interactive. Standard web UX.	Cache reads from Redis. Async writes (publish to queue, return 200). DB reads must hit indexes.
1–5s	Tolerable for complex queries.	Background jobs for heavy computation. DB aggregations OK if indexed. Show loading states.
> 5s	Batch is fine.	Async processing, queues, offline jobs. No sync response needed.

Why measure p99, not average

P99 = the response time that 99% of requests complete faster than. Average hides the worst 1%.

99 requests at 50ms + 1 request at 10,000ms:
  Average = 149ms  ← looks healthy
  P99     = 10,000ms ← system is on fire

At any meaningful scale, 1% is a lot of users. Average would never surface it.

P99 is also an early overload warning. When a system gets busy, the tail degrades first — P50 stays flat while P99 spikes. By the time P50 looks bad, you're already deep in trouble.

State	P50	P99
Healthy	40ms	80ms
Getting busy	45ms	400ms
Overloaded	80ms	2,500ms

Watch P99 — it gives you the window to act (shed load, scale out, open a circuit breaker) before most users feel anything.

Sync vs async — the biggest single latency lever. Every synchronous call in the chain adds to total response time and the user waits for all of it. The most impactful latency decision is whether the response must be synchronous at all. If the user doesn't need the result immediately — order confirmation, payment processing, sending a notification — make it async: return 202 Accepted immediately and process out-of-band. Synchronous means the user waits for every hop. Async breaks the chain.

Cascading latency in microservices. Serial service calls multiply: 5 services × 50ms each = 250ms minimum before any slow path or retry. Identify the critical path. Parallelize calls that are independent of each other. Cache aggressively between services. Every extra synchronous hop is a latency tax that compounds.

What actually causes tail latency (the worst 1%). Understanding the cause points to the fix:

GC pauses — JVM/Go garbage collection stops the world for tens to hundreds of milliseconds. Tune heap size; use GC-friendly data structures.
Lock contention — threads queuing for the same mutex. Reduce shared mutable state; prefer lock-free structures or actor models.
Connection pool exhaustion — all DB connections in use; new requests wait. Right-size the pool; add a connection pooler (PgBouncer, RDS Proxy).
Cold cache misses — first request after a deploy or eviction hits the DB. Warm the cache on startup; use a longer TTL for stable data.

Latency vs throughput: Latency is how fast one request completes. Throughput is how many complete per second. Batching increases throughput but adds per-request latency — know which the interviewer cares about.

Per-component costs: For per-hop numbers (LB, Redis, DB, Kafka, S3) and end-to-end breakdowns, see the Latency Reference Table in System Design Layers.

Most critical for: Search/autocomplete (Yelp, Google — < 100ms), stock trading (HFT — microseconds), multiplayer gaming, ride-matching (Uber — driver must get request fast).

Availability

How much downtime is acceptable? This drives redundancy, replication topology, and failover strategy across all layers.

Ask:

What's the uptime requirement?
What happens to users if this goes down?

SLA	Downtime/Month	Architecture Pattern
99%	~7.2 hrs	Single region, single DB
99.9%	~43 min	Multi-AZ, auto-failover DB, Redis Sentinel
99.99%	~4.3 min	Active-active multi-region, CDN as buffer
99.999%	~26 sec	No SPOF anywhere — sync replication, blue/green deploys

How each layer survives failure:

Load Balancer: Health checks pull unhealthy app servers from rotation in seconds. Deploy across AZs so one AZ outage doesn't take the LB down.
App Servers: Keep stateless — no local state — so any instance can handle any request. Auto-scaling group replaces failed instances automatically.
Cache (Redis): Redis Sentinel auto-promotes a replica when the primary dies. Redis Cluster adds sharding so a dead node loses only its slice, not everything.
Database: Primary + replica with automatic failover on primary crash. Cross-region replication (async for performance, sync for zero data loss) at higher SLA tiers.
Message Queue: Kafka replication factor ≥ 3 — two broker deaths don't lose messages. Consumers resume from their last committed offset.
CDN: Globally distributed by design. Absorbs traffic from a partially down origin and serves cached content during brief outages.

CAP Theorem tradeoff: During a network partition, choose availability (keep serving, possibly stale) or consistency (stop serving until consistent). Most consumer apps choose availability. Payment systems choose consistency.

Graceful degradation: When a dependency fails, degrade to a lesser but still useful response — don't fail completely. Netflix returns cached thumbnails when the recommendations service is down. A search service returns cached results when the index is unavailable. A checkout flow disables the "suggested add-ons" widget but still processes the order. Decide in advance what each service degrades to — it shouldn't be an incident-time decision.

SLI / SLO / SLA — know the difference:

Term	What It Is	Example
SLI	The measured metric	"97.8% of requests completed in < 200ms this week"
SLO	Internal target your team is held to	"99.9% of requests must complete in < 200ms"
SLA	Customer-facing contract with financial penalties	"99.5% uptime or credits issued"

SLA is always looser than SLO — if they were equal, every internal incident would trigger customer credits.

Error budget: 1 − SLO. A 99.9% SLO gives you ~43 min/month to spend on incidents and deploys. When the budget is gone, freeze non-critical changes until the window resets.

Each extra 9 is roughly 10× harder and more expensive. Push back if the requirement seems over-engineered.

Most critical for: Payment processors (Stripe, Visa — 99.999%), AWS infrastructure, healthcare systems, any system where downtime = revenue loss or safety risk.

Consistency

When a write happens, when do all nodes and users see it? This is the core CAP tradeoff in practice.

Ask:

Can users see slightly stale data?
If two users write at the same time, does it matter which one wins?

Model	What It Means	When to Use	Real Example
Strong (Linearizable)	Every read sees the latest write across all nodes immediately.	Payments, inventory, bank balances	PostgreSQL, Zookeeper, Spanner
Read-your-writes	You always see your own latest write. Others may lag briefly.	Profile updates, settings	Most social apps for own data
Eventual	All nodes converge on the same value — eventually. Briefly stale is OK.	Social feeds, like counts, view counts	Cassandra, DynamoDB default, DNS
Causal	Cause before effect, globally. Unrelated writes can appear in any order.	Comments/replies, chat, collaborative editing	MongoDB sessions, DynamoDB transactions

Read-your-writes in practice: Route your reads to the primary (or the specific replica that received your write) for a short window after a write. Without this, you may hit a lagging replica and see your own edits disappear — a confusing UX even if technically within the eventual consistency contract.

Causal consistency explained: If Alice posts "I'm going to the store" and Bob replies "I'll come with you", causal consistency guarantees Carol always sees Alice's post before Bob's reply — because Bob's reply causally depends on Alice's post. Carol might see Alice's post before or after Dave's unrelated status update — that's fine, they're not causally linked.

This is stronger than eventual (which could show Bob's reply before Alice's post) but weaker than strong (which globally orders every single write). It's the right choice when order matters within a thread or conversation, but not globally.

Conflict resolution — what happens when two concurrent writes conflict: With eventual consistency, two nodes can receive different writes to the same key simultaneously. Systems handle this in three ways:

Last-write-wins (LWW) — the write with the latest timestamp wins. Simple, but can silently discard data if clocks drift. Default in Cassandra.
Vector clocks — each write carries a version vector tracking causality. The system detects genuine conflicts and surfaces them to the application to resolve. Used in DynamoDB, Riak.
CRDTs (Conflict-free Replicated Data Types) — data structures that merge automatically without conflicts by design. Counters, sets, and append-only logs are natural CRDTs. Used in Redis, collaborative editors.

Consistency vs latency — the core tradeoff: Stronger consistency costs latency because it requires cross-node coordination before acking a write. Strong consistency waits for all replicas to confirm — adding replication lag to every write. Eventual consistency acks immediately and propagates asynchronously. You can't have both zero latency and strong consistency across distributed nodes.

Tunable consistency (Cassandra / DynamoDB): Many systems let you choose per operation. QUORUM read (majority of replicas must agree) gives strong-ish consistency at higher latency. ONE read returns as soon as one replica responds — fast, but potentially stale. This lets you use strong consistency only where it matters, and eventual everywhere else.

ACID vs BASE:

ACID (relational DBs) — Atomic, Consistent, Isolated, Durable. All or nothing, always correct.
BASE (most NoSQL) — Basically Available, Soft state, Eventually consistent. Always up, eventually right.

Choosing a DB is often choosing between these two philosophies.

Interview signal: "It's fine if the like count is off by a few seconds" → eventual consistency, scale horizontally. "Double-charging a user is unacceptable" → strong consistency, accept the latency cost.

For retry safety and duplicate prevention, see Idempotency below.

Most critical for: Banking and payments (double-spend prevention), inventory systems (Amazon — can't oversell), booking systems (airline seats, hotel rooms).

Idempotency

If a client retries a request, will it cause duplicate side effects? This shapes how you design APIs and payment flows.

Ask:

Can clients retry failed requests safely?
Are there operations where duplicates are catastrophic (charges, transfers, order submissions)?

Why duplicates happen — delivery semantics:

Delivery	Guarantee	Implication
At-most-once	Delivered 0 or 1 times	May lose messages. No duplicates.
At-least-once	Delivered 1 or more times	No loss. May duplicate. Requires idempotent consumers.
Exactly-once	Delivered exactly 1 time	No loss, no duplicates. Expensive — requires coordination.

Most queues and networks use at-least-once — they guarantee delivery but may retry on failure, causing duplicates. The practical pattern: use at-least-once (cheaper) + idempotent consumers (so duplicates are harmless).

The problem: A client sends a payment request. The server processes it, but the response is lost in transit. The client retries. Without idempotency, the user gets charged twice.

The solution — idempotency keys: Client generates a unique key per logical operation (e.g., UUID) and sends it with the request. Server stores (idempotency_key → result) and returns the cached result on any duplicate. Use Redis with a TTL of 24h–7 days (retries happen within a short window, so indefinite storage isn't needed). For payment-critical flows where the guarantee must survive a Redis restart, back it with a DB row as well.

Operation Type	Naturally Idempotent?	Fix
GET, DELETE	Yes (GET reads, DELETE on missing is no-op)	Nothing needed
PUT (replace entire resource)	Yes	Nothing needed
POST (create, charge, transfer)	No	Add idempotency key
Message consumer processing	No	Track processed message IDs in DB

Idempotency in queues: A Kafka consumer that crashes mid-processing will re-receive the same message on restart (at-least-once delivery). Design consumers to be idempotent — check if the event was already processed (by storing the event ID) before acting on it. Kafka does support exactly-once semantics (EOS) via idempotent producers and transactions, but the operational complexity is high — at-least-once delivery with idempotent consumers is almost always the simpler and preferred approach.

Most critical for: Payment APIs (Stripe uses idempotency keys on every charge endpoint), order submission, booking systems, any POST that creates or transfers.

Durability

How much data loss is acceptable if the system crashes or a node goes down?

Ask:

If we lose a server right now, what's the worst acceptable outcome?
Can we replay events from a log?

RPO = Recovery Point Objective — how much data can we lose? Measured in time (0ms, 1s, 1hr means you lose up to that much data). RTO = Recovery Time Objective — how long can the system be down during recovery?

RPO and RTO are independent axes. A system can have RPO=0 (zero data loss) but RTO=minutes (takes time to promote a replica). Or RPO=hours (some loss tolerable) but RTO=seconds (instant recovery from a warm standby). Set them separately based on what the business actually needs.

Term	Definition	Design Impact	Latency Cost
RPO = 0	Zero data loss.	Synchronous replication: primary waits for all replicas to confirm before acking the write.	+5–20ms per write at DB layer. +10–50ms if two-phase commit across services.
RPO = seconds	Tiny loss OK.	Async replication. WAL (write-ahead log) shipped to replica continuously.	No extra latency — write acks immediately, replication happens in background.
RPO = hours	Some loss tolerable.	Periodic snapshots or nightly backups.	No latency impact.
RTO = seconds	Must recover near-instantly.	Hot standby replica already running, promoted automatically on failure (~30–60s).	No latency impact on normal path.
RTO = minutes	Fast recovery needed.	Warm standby: replica exists but not serving traffic. Promoted manually or semi-auto.	—
RTO = hours	Slower recovery OK.	Restore from backup. Spin up new instance.	—

Real examples:

Banking: RPO = 0. Every transaction written synchronously to multiple replicas before confirmation.
Social media posts: RPO = seconds is fine. Async replication acceptable.
Object storage: 11 nines of durability via cross-AZ redundant storage (AWS S3, GCP Cloud Storage).

Most critical for: Banking and financial systems, medical records (Epic, FHIR), legal document storage, payment transaction logs — any system where lost data = legal or financial liability.

Fault Tolerance

How well does the system handle partial failures without going fully down?

Ask:

What happens when one server crashes?
What happens when a whole datacenter goes down?
What if a dependency is slow or unavailable?

Failure Type	Strategy	Example
Single node crash	Redundant replicas, auto-failover	DB primary/replica, load balancer health checks
Slow dependency	Timeouts + circuit breaker	Stop calling a failing service; return fallback
Datacenter outage	Multi-AZ or multi-region active-active	Route traffic to surviving region
Data corruption	Checksums, write-ahead logs, point-in-time restore	Detect and roll back bad writes
Cascading failures	Bulkheads (isolate failure domains), rate limiting	Don't let one slow service take down everything

Always set explicit timeouts on external calls. Without a timeout, a slow dependency hangs your thread indefinitely — the thread pool fills up and your whole service stops responding. Set timeouts on every DB query, HTTP call, and queue operation. This is the prerequisite for everything else in this section.

Circuit breaker pattern: Wraps external calls and tracks failure rate. Three states:

Closed — normal operation. All requests pass through.
Open — too many failures. All requests blocked immediately; fallback returned. Dependency gets time to recover.
Half-open — after a cooldown period, let a small number of requests through as a probe. If they succeed, close the circuit. If they fail, reopen.

Return a cached or default response in Open state rather than propagating the error.

Retry with exponential backoff + jitter: When a request fails, wait before retrying — and double the wait each attempt (backoff). Add random jitter so all retrying clients don't slam the service at the same moment (thundering herd). A common sequence: retry after 1s, 2s, 4s, 8s with ±30% jitter, then give up and dead-letter.

Bulkhead pattern: Isolate resources (thread pools, connection pools, memory) by service so one slow dependency can't exhaust shared resources and take down everything else. Named after ship compartments that contain flooding to one section. Example: give the payment service its own thread pool; if payment calls hang, they only exhaust that pool — the order service, running in its own pool, keeps serving normally.

Dead Letter Queue (DLQ): When a message fails processing repeatedly (after N retries), route it to a DLQ instead of blocking the queue. The DLQ holds poisoned messages for inspection and manual replay. Without a DLQ, one bad message can stall an entire consumer group indefinitely. (AWS SQS dead-letter queues, GCP Pub/Sub dead-letter topics, or a separate Kafka topic).

Most critical for: Microservices architectures (each service can fail independently), distributed databases, any system with SLA > 99.9%.

Security

What data does the system handle and who should access it? Drives auth, encryption, and regulatory design.

Ask:

Does this handle PII, payments, or health data?
Who are the users — public, internal, B2B?

Key terms:

Authentication (AuthN) — who are you? Verify the caller's identity. JWT, OAuth2, API keys.
Authorization (AuthZ) — what can you do? Check permissions after identity is confirmed. RBAC, ACL.
PII (Personally Identifiable Information) — any data that can identify a person: name, email, phone, SSN, IP address. Triggers GDPR/HIPAA obligations.
TLS (Transport Layer Security) — encrypts data in transit (the "S" in HTTPS). Prevents interception.
AES-256 (Advanced Encryption Standard) — standard algorithm for encrypting data at rest. Used in S3, databases, filesystems.
JWT (JSON Web Token) — signed token the client sends with each request to prove identity. Stateless, server doesn't store sessions.
OAuth2 — standard for delegated auth. "Sign in with Google" is OAuth2. Separates identity from your app.
mTLS (Mutual TLS) — both sides verify certificates. Used for service-to-service auth inside your system.
RBAC (Role-Based Access Control) — users get roles (admin, editor, viewer), roles get permissions. Simpler than per-user rules.
ACL (Access Control List) — per-resource list of who can do what. More granular than RBAC (e.g. S3 bucket policies).

Security by layer:

Layer	What Goes Here
CDN	DDoS protection, WAF (Web Application Firewall) blocks malicious requests before they reach origin
Load Balancer	TLS termination (decrypt HTTPS here, forward HTTP internally), IP whitelisting
API Gateway	Authentication (verify JWT/OAuth token), rate limiting (token bucket), request validation
App Server	Authorization (RBAC checks — "can this user do this action?"), input validation, business logic security
Cache (Redis)	Don't cache raw PII if avoidable. Redis AUTH password. Encrypt sensitive values if stored.
Database	AES-256 encryption at rest. Row-level security for multi-tenant data. Least-privilege DB users. Audit log here — append-only table logging who accessed what and when.
Object Storage	Signed URLs for private files (time-limited access). Bucket policies. Server-side encryption.
Secrets	Never hardcode credentials or put them in env files checked into source control. Inject at runtime from a secrets manager (AWS Secrets Manager, GCP Secret Manager, HashiCorp Vault). Rotate automatically. Least-privilege IAM roles instead of long-lived keys where possible.

Most critical for: Healthcare (HIPAA — audit every access to patient records), financial systems (PCI-DSS — card data tokenized immediately), auth systems (OAuth provider), any multi-tenant SaaS.

Compliance

Are there legal or regulatory constraints that shape the architecture?

Ask:

What region are users in?
Does this handle health, financial, or personal data?

Key terms:

GDPR (General Data Protection Regulation) — EU law. Applies to any system with EU users, regardless of where the company is located.
HIPAA (Health Insurance Portability and Accountability Act) — US law governing health data. Applies to any app handling patient records.
PCI-DSS (Payment Card Industry Data Security Standard) — required for any system that stores, processes, or transmits card data.
SOC 2 — US auditing standard for SaaS companies. Type I = point-in-time assessment. Type II = 6 months of continuous evidence. Required by enterprise buyers.

Regulation	Who It Affects	Key Architecture Constraint
GDPR (EU)	Any system with EU users	Data residency in EU. Right to delete (complicates append-only logs). Breach notification in 72hrs.
HIPAA (US healthcare)	Medical records, health apps	Audit log every data access. Encryption in transit and at rest. Business associate agreements with vendors.
PCI-DSS (payments)	Any system touching card data	Card data never stored raw — tokenize immediately on receipt. Annual third-party audits. Network segmentation.
SOC 2	B2B SaaS	Documented security controls. Access reviews. Incident response plan.

GDPR complicates event-sourcing: Append-only logs make "right to delete" hard — you can't erase a past event. Solve with tombstone records or keep PII in a separate deletable store and only store user IDs in the event log.

Most critical for: Healthcare apps, payment processors, social platforms with EU users, any enterprise B2B SaaS sold to regulated industries.

Monitoring & Observability

How do you know the system is healthy in production? Drives logging, metrics, and alerting design.

Ask:

Do you need real-time alerting?
How quickly must the team detect and diagnose production issues?

Signal	What It Covers	Tools
Metrics	QPS, latency, error rate, CPU/memory/disk	Prometheus, Datadog, AWS CloudWatch, GCP Cloud Monitoring
Logs	What happened and in what order	ELK stack, Splunk, AWS CloudWatch Logs, GCP Cloud Logging
Traces	Where time was spent across services	Jaeger, Zipkin, AWS X-Ray, GCP Cloud Trace
Alerts	Notify when SLA is breached	PagerDuty, Opsgenie

The four golden signals (Google SRE): Latency, Traffic, Errors, Saturation. Build monitoring around these first.

Latency — how long requests take (track p99, not average)
Traffic — how much load the system is under (QPS, requests/sec)
Errors — rate of failed requests (5xx errors, timeouts, exceptions)
Saturation — how "full" a resource is. CPU at 95%, disk at 98%, connection pool nearly exhausted — saturation predicts future failure before users feel it. Monitor: CPU %, memory %, disk I/O, DB connection pool usage, queue depth.

Distributed tracing — finding the slow hop. In microservices, a high p99 could come from any service in the call chain. Distributed tracing gives you a waterfall view across all hops. The mechanism: attach a correlation ID (UUID) to every incoming request and propagate it in headers through every downstream call. Each service logs its span (start time, duration, service name) tagged with that ID. Tools like Jaeger or AWS X-Ray stitch spans into a full trace — you can see exactly which service added the latency.

Alert on burn rate, not just thresholds. A threshold alert (e.g. "error rate > 1%") fires after you've already breached. Burn rate alerting asks how fast you're consuming the error budget — a 10× burn rate means you'll exhaust the month's budget in 3 days. Alert early and act before most users are affected.

Structured logging. Log in JSON, not free text. Structured logs can be queried, filtered, and aggregated programmatically (level=error service=payments userId=42). Free-text logs are hard to analyse at scale and require fragile regex parsing.

Most critical for: Any system with a strict SLA, microservices (failures are hard to trace without correlation IDs), financial systems where bugs cost real money.

Environment Constraints

Are there non-standard constraints on the environment the system runs in?

Ask:

Are clients on mobile or constrained devices?
Are there low-bandwidth or offline scenarios to handle?

Constraint	Design Impact
Mobile clients	Minimize payload size. Compress responses. Offline-first with local cache.
Low bandwidth (3G/rural)	Adaptive bitrate streaming (YouTube, Netflix). Delta sync instead of full sync.
Limited battery	Batch network calls. Avoid polling — use push (WebSockets, FCM).
Edge/IoT devices	Lightweight protocols (MQTT). Local processing before cloud sync.
Offline-first	Local DB (SQLite), sync on reconnect, conflict resolution strategy.

Most critical for: Uber driver app (poor network in some cities), Google Maps offline, WhatsApp (works on 2G), IoT sensor pipelines, healthcare apps in hospitals with restricted networks.

Quick Reference — Which NFR Matters Most

System	Top NFRs to Prioritize
Banking / payments	Consistency, Idempotency, Durability, Security, Compliance
Social feed (Twitter, Instagram)	Scale, Availability, Latency
Healthcare records	Durability, Security, Compliance, Availability
Search / autocomplete (Yelp, Google)	Latency, Scale
Ride-sharing (Uber)	Availability, Latency, Fault Tolerance, Environment
Video streaming (Netflix, YouTube)	Scale, Availability, Latency, Environment
Chat / messaging (WhatsApp)	Availability, Durability, Environment
Ticketing / booking (Airbnb, airlines)	Consistency, Availability, Scale
Enterprise SaaS	Security, Compliance, Availability
IoT / sensor pipeline	Scale, Fault Tolerance, Environment, Durability