Load Balancing for System Design Interview

Beginner-level explanation of load balancing for PM system design interviews, with product scenarios (flash sales, global apps, healthcare), UPBT framework, practice questions, and interview quick reference.

Input

What you provide to the skill

Explain load balancing for a system design interview. I’m a beginner.

Output

What the skill generates for you

Load Balancing

Simple Definition

Load balancing is like having multiple checkout lanes at a grocery store with a greeter directing customers to the shortest line. It distributes incoming user requests across multiple servers so no single server gets overwhelmed.

When PMs Discuss This

System design interviews: “How would you handle millions of concurrent users?”
Scaling conversations: “Our app is slow during peak hours—what should we do?”
Architecture reviews: “Why do we need more than one server?”
Reliability planning: “What happens if a server crashes?”

Why It Matters for Products

User experience: Users get responses in 200ms instead of timing out
Business reliability: 99.99% uptime instead of crashes during traffic spikes
Cost efficiency: Use 10 medium servers instead of 1 massive (expensive) server
Geographic performance: Route users to nearest server for faster response

PM Depth vs SWE Depth

✅ PM should know: Why load balancing prevents outages, basic distribution strategies (round-robin, geographic), health checks, failover concepts
❌ PM should avoid: Specific load balancer products (NGINX vs HAProxy), TCP/UDP protocols, session affinity algorithms, SSL termination details

Product Scenarios

Scenario 1: E-Commerce Flash Sale

Product Problem:
Your e-commerce site runs Black Friday sales. Last year, the site crashed at 9 AM when 500K users arrived simultaneously. You have one server that can handle 10K concurrent users.

PM Decision:
Deploy 50 servers behind a load balancer. Distribute traffic evenly so each server handles ~10K users. Include health checks to automatically remove failed servers.

Trade-offs:

Pros	Cons
Handles 500K users without crashes	50x server cost during sale period
Automatic failover if servers crash	User sessions might break if server switches
Gradual scaling (add/remove servers as needed)	Initial setup complexity and testing

When NOT to Use This:

Small apps with <1,000 users (single server is fine)
Internal tools with predictable low traffic
Hobby projects where cost outweighs benefit

Real Numbers:

Traffic capacity: 10K → 500K concurrent users
Uptime during sale: 60% (crashed) → 99.9% (stayed up)
Revenue impact: $2M lost last year → $0 lost this year
Infrastructure cost: $500/month → $5K during sale month (worth it for $20M in sales)

Scenario 2: Global Social Media App

Product Problem:
Your social media app has users worldwide. European users complain that loading feeds takes 5 seconds, while US users see 500ms load times. All servers are in US-East.

PM Decision:
Use geographic load balancing—deploy servers in Europe, Asia, and US. Route users to their nearest server to reduce network latency.

Trade-offs:

Pros	Cons
European load time: 5s → 600ms	Data sync complexity across regions
Better user experience = higher engagement	3x infrastructure cost (3 regions)
Complies with data residency laws (GDPR)	Content moderation across time zones

Real Numbers:

European latency: 5000ms → 600ms (88% reduction)
User engagement: +25% in European markets
Churn reduction: 15% → 8% in slow regions
Cost: $10K/month → $30K/month (justified by 25% engagement boost)

Interview Response Frameworks

Framework 1: UPBT (User → Product → Business → Technical)

Example:
“Users expect instant page loads even during traffic spikes. Our product needs to handle Black Friday sales without crashing, which directly impacts our annual revenue—last year we lost $2M to downtime. Technically, we’d use load balancing to distribute traffic across multiple servers, with auto-scaling to handle unpredictable spikes. This gives us 99.99% uptime during critical sales periods.”

Framework 2: The Trade-off Sandwich

Example:
“I’d recommend deploying load balancing with geographic distribution. This gives us sub-second load times globally and meets data residency requirements for GDPR. The trade-off is 3x infrastructure cost and data sync complexity across regions. However, the 25% engagement boost in international markets justifies the investment, and we’re required to comply with GDPR anyway.”

Practice Question

Interview Prompt:
“You’re designing a ride-sharing app like Uber. How would you ensure the system stays responsive during Friday night surge (10x normal traffic)?”

How to Approach:

Identify the user pain (riders can’t book, drivers can’t accept = lost revenue)
Explain load balancing distributes requests across servers
Discuss auto-scaling for unpredictable spikes
Mention geographic distribution for global coverage

Model Answer (UPBT):
“Riders need instant app response to book rides during surge times—even 2-second delays cause them to try competitors. Our product must handle 10x traffic spikes on Friday nights and during events, which represents 40% of weekly revenue. Technically, I’d use auto-scaling load balancing—normally 50 servers, scaling to 500 during surge. Geographic distribution means a rider in Tokyo hits Tokyo servers (200ms) not US servers (2000ms). The trade-off is infrastructure cost—$50K/month normally, $200K during peak weeks. But one major outage costs more than a year of peak infrastructure, plus brand damage.”

Interview Quick Reference: Load Balancing

30-Second Definition:
Load balancing distributes user requests across multiple servers—like airport security having multiple lanes instead of one. Prevents crashes during traffic spikes and provides failover if a server dies.

When It Comes Up:

“How would you handle 10x traffic?”
“What happens if a server crashes?”
“Why is the app slow during peak hours?”

Key Trade-offs:

Benefit	Cost
Handles traffic spikes	Higher infrastructure cost
Automatic failover (reliability)	Increased complexity
Geographic performance	Data sync challenges

Red Flags (what NOT to say):

Don’t mention: NGINX vs HAProxy, least-connections vs IP-hash algorithms
Avoid: TCP handshake details, keepalive settings, SSL certificate management

Power Phrases:

“From a user perspective, load balancing prevents timeouts during traffic spikes…”
“The business trade-off is infrastructure cost vs revenue loss from outages…”
“Users would experience this as consistently fast response times even during viral growth…”