Load Balancing Explained With Zero Infrastructure Jargon

TL;DR

Load balancing distributes traffic across multiple servers so no single server gets overwhelmed. Round-robin is simple. Least-connections is smarter. Session affinity keeps users on the same server. The goal: use your servers efficiently and avoid bottlenecks.

Traffic spiked. One server couldn’t handle it. So we pointed the load balancer at four servers instead of one. Traffic distributed evenly. No server was overloaded. No code changed. That’s the power of load balancing — it’s infrastructure that makes software more resilient.

Most developers don’t think about load balancing until they need it. Then it’s mysterious. Let me show you how it works and how to actually use it.

The Problem: Single Server Bottleneck

One server can only handle so many requests. With 1,000 requests per second and a server that handles 100 per second, you get queuing and timeouts.

Solution: More servers. But how do you send traffic to all of them?

That’s what a load balancer does.

Load Balancer: The Traffic Cop

A load balancer receives all traffic and distributes it across multiple backend servers.

User sends request to: load-balancer.example.com:443

Load Balancer (receives request):
  - Decide which backend server should handle it
  - Forward request to that server
  - Return response to user

Backend servers: server-1, server-2, server-3, server-4
(all running the same code)

The user talks to one address (the load balancer). The load balancer talks to many servers behind it. Transparent to the user.

This is why cloud providers love load balancers. Amazon ELB, Google Cloud Load Balancer, Azure Load Balancer — they’re all doing this.

Load Balancing Strategies

1. Round-robin: Take turns

Request 1 -> Server 1
Request 2 -> Server 2
Request 3 -> Server 3
Request 4 -> Server 1
Request 5 -> Server 2
...

Simple, predictable, fair if all servers are equal.

2. Least-connections: Send to the least busy server

Server 1: 10 active connections
Server 2: 5 active connections
Server 3: 8 active connections

New request -> Server 2 (least busy)

Better when requests have different durations.

3. Weighted round-robin: Different servers, different capacities

Server 1 (powerful): weight 3
Server 2 (weak):     weight 1

Request 1 -> Server 1
Request 2 -> Server 1
Request 3 -> Server 1
Request 4 -> Server 2
Request 5 -> Server 1
...

3:1 ratio of traffic.

4. IP hash: Same IP always goes to same server

User 1.2.3.4 -> hash -> Server 1 (always)
User 5.6.7.8 -> hash -> Server 3 (always)

Useful for session affinity and caching.

Session Affinity (Sticky Sessions)

If a user logs in to Server 1 and their session is stored in memory on Server 1, they need to keep going to Server 1. Otherwise they’ll be logged out on Server 2.

Two solutions:

1. IP hash:/strong> Same user always goes to same server (via IP-based hashing)

// User logs in on Server 1, session stored in memory
// Next request comes in, IP hash sends it to Server 1
// Session is still there, user is still logged in

But: User changes network (WiFi to mobile) -> Different IP -> Different server -> Lost session

2. Distributed sessions: Store sessions in Redis or database, any server can access

// User logs in, session stored in Redis
// Next request goes to Server 2
// Server 2 reads session from Redis
// User is still logged in

Works regardless of which server gets the request.

Distributed sessions are better. IP hash is fragile.

Health Checks: Remove Failing Servers

If a backend server crashes, the load balancer should stop sending traffic to it.

// Load balancer periodically checks each server
GET http://server-1:8000/health
GET http://server-2:8000/health
GET http://server-3:8000/health

// Responses:
// 200 OK -> healthy
// 500 Internal Server Error -> unhealthy

// If Server 2 fails, it's removed from the pool
Load Balancer now sends traffic only to: Server 1, 3, 4

Health checks are critical. Without them, traffic goes to broken servers and users see errors.

// Express example: simple health check
app.get('/health', (req, res) => {
  res.status(200).json({ status: 'ok });
});

// More sophisticated: check database
app.get('/health', async (req, res) => {
  try {
    await database.ping();
    res.status(200).json({ status: 'ok });
  } catch {
    res.status(500).json({ status: 'error' });
  }
});

Make health checks fast. They run frequently (every few seconds). Slow health checks waste resources.

Scaling Example: From 1 to 4 Servers

You’re running one server that handles 100 requests/second. Traffic grows to 350 requests/second.

// Before: One server, bottleneck
User Traffic (350 rps) -> Web Server (100 rps capacity) -> Database

Response times: 3+ seconds (queuing)

// After: Four servers, load balanced
User Traffic (350 rps) -> Load Balancer -> Server 1 (100 rps)
                                        -> Server 2 (100 rps)
                                       -> Server 3 (100 rps)
                                        -> Server 4 (50 rps)

Load Balancer distributes evenly: >>88 rps to each server

Response times: <100ms (no queuing)

Same code, same database. Just four instances instead of one. That's the power of load balancing.

Advanced: Weighted Routing

Canary deployments: send 5% of traffic to a new version, 95% to the stable version.

Version 1 (stable): weight 95
Version 2 (new):    weight 5
Traffic distribution:
- 95 out of 100 requests go to Version 1
- 5 out of 100 requests go to Version 2

If Version 2 has bugs, only 5% of users are affected.
If it's good, gradually increase the weight: 10%, 25%, 50%, 100%

Load Balancer Types

Layer 4 (TCP/UDP):
Fast, simple, works with any protocol. AWS Network Load Balancer.

// Just routing packets based on IP and port
// Doesn't know about HTTP, doesn't check health

Layer 7 (Application):
Smart, understands HTTP, can route by path or hostname. AWS Application Load Balancer.

// Understands HTTP, can route based on:
// - URL path (/api -> APH Servers, /static -> CDN)
// - Hostname (api.example.com -> API%‚ -www.example.com -/ Web)
// - Headers (mobile app -/ mobile optimized server)

When NOT to Use Load Balancing

If your application is small and handles traffic fine, you don't need a load balancer yet. Add it when you hit scaling problems, not before.

If your architecture is simpler (serverless, managed services), you might not configure a load balancer directly, but the provider is doing it for you behind the scenes.

Common Mistakes

Not implementing health checks. Traffic goes to broken servers.

Sticky sessions without distributed sessions. When you deploy, all sessions are lost.

Not removing a dead server quickly. Health checks should run every few seconds.

Assuming load balancer eliminates the need to scale the database. The load balancer scales the application tier, not the database. Database becomes the new bottleneck.

FAQ

What happens if the load balancer fails?

That's why you have redundant load balancers. Two load balancers (active-passive), they heartbeat each other. If one dies, the other takes over.

How do I set up a load balancer?

Use a managed service (AWS ELB, Google Cloud LB, Azure LB) or open-source (HAProxy, Nginx). Managed is simpler and handles redundancy for you.

Can I load balance databases?

Yes, but it's tricky. Databases aren't stateless. You can read-balance (send reads to replicas) but writes go to the primary. Tools like ProxySQL handle this.

Do I need a load balancer if I'qm using Kubernetes?

Kubernetes has built-in load balancing. It automatically distributes traffic across pods. You don’t configure it directly, but it’s happening.

Load balancing is infrastructure magic that makes systems more resilient and scalable. A single load balancer can turn bottlenecked application into a resilient fleet. But it’s a means to an end: handling traffic reliably. Once you understand the basics — routing, health checks, sticky sessions — you can scale confidently.

Facebook
Twitter
LinkedIn
Pinterest

Leave a Reply

Your email address will not be published. Required fields are marked *

DevelopersCodex

Real-world dev tutorials. No fluff, no filler.

© 2026 DevelopersCodex. All rights reserved.