I Built a L7 Load Balancer in Go from Scratch. Here's a Dive into the Algorithms

Load balancing is the silent hero of scalable web services, the unsung orchestrator ensuring our favorite applications remain responsive under immense pressure. Yet, for many developers, its inner workings feel like a black box. Theory is one thing, but true understanding comes from building. My curiosity and a desire to truly understand the nuances behind different balancing strategies inspired me to build my own L7 load balancer from scratch in Go.

This is the story of that journey, focusing not just on the code but on the architectural decisions and the intelligent algorithms that power a production-ready solution.

L4 vs. L7: The Crucial Distinction

Load balancers distribute traffic to prevent any single server from becoming a bottleneck. But the intelligence behind that distribution varies greatly depending on the layer they operate at:

L4 Load Balancers: Operating at the transport layer (TCP/UDP), these are fast and route traffic based on IP addresses and ports. They are simple, efficient, and completely blind to the content of the requests they are directing.
L7 Load Balancers: Operating at the application layer (HTTP/HTTPS), these are far more intelligent. Because they are application-aware, they can inspect request data—like headers, URLs, and cookies—to make sophisticated, context-aware routing decisions. My project dives deep into the L7 world, which unlocks the powerful algorithms we’ll explore below.

The Foundation: A Modular Architecture

A system is only as strong as its architecture. To avoid a monolithic beast and promote clean code, I designed a shared/ package to house core, reusable components. This deliberate separation allows each balancing algorithm to focus purely on its decision-making logic, while common tasks like health checking and request proxying are handled centrally.

This design delivers three key benefits:

Maintainability: Common logic is in one place.
Testability: Each component can be tested in isolation.
Extensibility: Adding a new algorithm is trivial; it simply reuses these battle-tested components.

This modularity isn’t just about tidy code; it’s a senior-level insight into building maintainable and resilient distributed systems.

Load Balancer Algorithms: The Intelligence Behind Distribution

The magic of load balancing isn’t in the proxying—it’s in the decision. How do you pick which server gets the next request? Each algorithm embodies a different philosophy about fairness, efficiency, and the nature of load itself.

Random: Embracing Chaos

The Philosophy: Sometimes the best strategy is no strategy at all.
Random selection throws conventional wisdom out the window. No state tracking, no complex calculations—just pure statistical distribution over time. It’s the engineering equivalent of “it’ll work out.”

func (rb *RandomBalancer) NextBackend(r *http.Request) *shared.Backend {
    backends := rb.pool.GetAliveBackends()
    if len(backends) == 0 {
        return nil
    }
    index := rb.rand.Intn(len(backends))
    return backends[index]
}

When it shines: High-volume, short-lived requests where individual routing decisions don’t matter, but aggregate distribution does. Think API endpoints that return cached data or stateless microservices.
The catch: Short-term clustering is inevitable. You might randomly hit the same server three times in a row while another sits idle. For most workloads, this evens out. For latency-sensitive applications, those clusters can hurt.
The hidden insight: Random is actually brilliant for auto-scaling scenarios. No state means no coordination needed when backends come and go.

Round Robin: Democratic Distribution

The Philosophy: Everyone deserves an equal turn.
Round Robin is the embodiment of fairness—each backend gets the same number of requests, in order, forever. It’s predictable, debuggable, and feels “right” to human intuition.

func (rr *RoundRobinBalancer) NextBackend(r *http.Request) *shared.Backend {
    backends := rr.pool.GetAliveBackends()
    if len(backends) == 0 {
        return nil
    }
    idx := rr.pool.NextIndex()
    index := (idx - 1) % uint64(len(backends))
    return backends[index]
}

When it shines: Homogeneous environments where all backends have identical capacity and requests have predictable processing times. Perfect for serving static content or simple CRUD operations.
The brutal reality: Real servers aren’t identical, and real requests aren’t uniform. That “slow” server still gets its fair share of traffic, creating a bottleneck that drags down your entire system. One misbehaving backend can poison the well.
Pro tip: Round Robin works best when combined with aggressive health checking and circuit breakers.

Weighted Round Robin: Merit-Based Democracy

The Philosophy: Fair doesn’t mean equal—it means proportional to capability.
This is Round Robin with a brain. You assign weights based on server capacity, and the algorithm ensures each backend gets requests proportional to its weight. The “smooth” implementation prevents traffic bursts by spreading weighted requests over time.

func (wrr *WeightedRoundRobinBalancer) NextBackend(r *http.Request) *shared.Backend {
    wrr.mutex.Lock()
    defer wrr.mutex.Unlock()
    backends := wrr.pool.GetAliveBackends()
    if len(backends) == 0 {
        return nil
    }
    var selectedBackend *shared.Backend
    var totalWeight int
    var maxWeight int
    for _, backend := range backends {
        totalWeight += backend.Weight
        wrr.currentWeights[backend.URL.String()] += backend.Weight
        if selectedBackend == nil || wrr.currentWeights[backend.URL.String()] > maxWeight {
            maxWeight = wrr.currentWeights[backend.URL.String()]
            selectedBackend = backend
        }
    }
    if selectedBackend != nil {
        wrr.currentWeights[selectedBackend.URL.String()] -= totalWeight
    }
    return selectedBackend
}

When it dominates: Heterogeneous environments, rolling deployments (start new servers at low weight), and gradual traffic shifting between data centers. It’s also perfect for A/B testing—just adjust weights to control traffic split.
The operational burden: Weights need maintenance. Server specs change, workloads evolve, and those carefully tuned weights become stale. It’s static optimization in a dynamic world.

Least Connections: The Workload Whisperer

The Philosophy: Busy servers should rest, idle servers should work.
This algorithm assumes that connection count correlates with server load. It’s the first truly dynamic algorithm—making routing decisions based on real-time state rather than pre-configured rules.

func (lc *LeastConnectionsBalancer) NextBackend(r *http.Request) *shared.Backend {
    backends := lc.pool.GetAliveBackends()
    if len(backends) == 0 {
        return nil
    }
    var selectedBackend *shared.Backend

    var minConnections int64 = -1

    for _, backend := range backends {
        connections := backend.GetConnections()
        if selectedBackend == nil || connections < minConnections {
            minConnections = connections
            selectedBackend = backend
        }
    }
    return selectedBackend
}

When it’s unstoppable: Applications with highly variable request processing times. Database queries that might take 50ms or 5 seconds, file uploads, long-polling connections, or any scenario where “number of requests” doesn’t equal “amount of work.”
The nuanced reality: Connections are a proxy for load, not load itself. A server handling 10 simple requests might be less loaded than one processing a single complex query. Also, connection counting requires careful lifecycle management—leaked connections will skew routing decisions.

IP Hash: The Consistency Keeper

The Philosophy: Some relationships are worth maintaining.
IP Hash sacrifices optimal load distribution for something more valuable—predictability. The same client always talks to the same server, enabling stateful interactions without distributed session storage.

func (ip *IPHashBalancer) NextBackend(r *http.Request) *shared.Backend {
    backends := ip.pool.GetAliveBackends()
    if len(backends) == 0 {
        return nil
    }

    clientIP, _, err := net.SplitHostPort(r.RemoteAddr)
    if err != nil {
        clientIP = r.RemoteAddr
    }
    hash := crc32.ChecksumIEEE([]byte(clientIP))
    index := int(hash) % len(backends)
    return backends[index]
}

The killer feature: Session affinity without session stores. Shopping carts, user preferences, temporary state—all live happily on individual servers without complex coordination.
The dark side: Load distribution becomes hostage to client distribution. A few corporate proxies can create massive hot spots. When a server dies, all its “sticky” clients lose their state.

Building for Production: The Non-Negotiables

A collection of algorithms does not make a system. To move from a proof-of-concept to a production-grade tool, you need to build for resilience and observability.

Active Health Checks: This is non-negotiable. My HealthChecker component continuously probes backends. If a server becomes unresponsive, it’s immediately and automatically removed from the active pool. This prevents the load balancer from sending requests into a black hole and is the cornerstone of a high-availability system.
Monitoring & Observability: You can’t fix what you can’t see. The system exposes several monitoring endpoints (/lb/stats, /lb/health, /lb/backends) that provide a real-time window into its performance, backend health, and request metrics. This operational insight is what separates a toy project from a reliable service.
Graceful Shutdown: The system listens for interrupt signals (SIGINT, SIGTERM) to shut down gracefully, allowing in-flight requests to complete before exiting. This attention to detail prevents data corruption and poor user experiences during deployments or maintenance.

The Algorithm Decision Matrix

Choose Random when: You have high traffic volume, stateless requests, and want zero-complexity routing.
Choose Round Robin when: All servers are identical, requests are uniform, and predictability matters more than optimal distribution.
Choose Weighted Round Robin when: Servers have different capacities, you’re doing gradual rollouts, or you need precise traffic control.
Choose Least Connections when: Request processing times vary significantly, or you’re dealing with long-lived connections.
Choose IP Hash when: You need session affinity and can’t use distributed session storage.

The best load balancer isn’t the one with the cleverest algorithm—it’s the one that matches your specific constraints and failure modes.

Dive into the code on GitHub

Want to dive deeper into the code, run the balancers yourself, or even contribute a new algorithm? Explore the full project:

https://github.com/Bethel-nz/Janus

I Built a L7 Load Balancer in Go from Scratch. Here’s a Dive into the Algorithms

L4 vs. L7: The Crucial Distinction

The Foundation: A Modular Architecture

Load Balancer Algorithms: The Intelligence Behind Distribution

Random: Embracing Chaos

Round Robin: Democratic Distribution

Weighted Round Robin: Merit-Based Democracy

Least Connections: The Workload Whisperer

IP Hash: The Consistency Keeper

Building for Production: The Non-Negotiables

The Algorithm Decision Matrix

Dive into the code on GitHub

Leave a Comment Cancel Reply

Sign up for the Newsletter

L4 vs. L7: The Crucial Distinction

The Foundation: A Modular Architecture

Load Balancer Algorithms: The Intelligence Behind Distribution

Random: Embracing Chaos

Round Robin: Democratic Distribution

Weighted Round Robin: Merit-Based Democracy

Least Connections: The Workload Whisperer

IP Hash: The Consistency Keeper

Building for Production: The Non-Negotiables

The Algorithm Decision Matrix

Dive into the code on GitHub

Must Read

Leave a Comment Cancel Reply