Locality Load Balancing – Istio By Example

If you’re running a high-scale, global application, you might be running services in multiple regions. If you have multiple replicas of the same service, you may want to direct client requests to the closest server, in order to minimize latency. You might also want a way to handle failover if one region goes down, and direct traffic to the closest available service.

Istio can help you automatically handle regional traffic using a feature called locality load balancing. Let’s see how.

Here, we have two Kubernetes clusters running in two different cloud regions, us-central and us-east. The Istio control plane is running in us-east, and we have set up single control plane Istio multicluster, so that services running in both clusters can reach each other.

When we started both clusters, the cloud provider added region-specific failure-domain labels to the Kubernetes nodes:

failure-domain.beta.kubernetes.io/region: us-central1
failure-domain.beta.kubernetes.io/zone: us-central1-b

Istio will populate requests with these locality labels, allowing Istio to redirect requests to the closest available region.

Both clusters are running an Istio-injected service called echo, which prints its location when accessed on port 80. The central cluster is also running a loadgen service that calls echo.default.svc.cluster.local:80 every second.

By default, the Kubernetes Service behavior is round-robin, between the two echo servers on both clusters:

$ 🌊 Hello World! - EAST
$ ✨ Hello World! - CENTRAL
$ 🌊 Hello World! - EAST
$ ✨ Hello World! - CENTRAL

We can enable locality load balancing by adding an outlier detection Istio DestinationRule on the east cluster:

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: echo-outlier-detection
spec:
  host: echo.default.svc.cluster.local
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 1000
      http:
        http2MaxRequests: 1000
        maxRequestsPerConnection: 10
    outlierDetection:
      consecutiveErrors: 7
      interval: 30s
      baseEjectionTime: 30s

Now, all loadgen requests are routed to the closest instance of echo, running in us-central:

$ ✨ Hello World! - CENTRAL
$ ✨ Hello World! - CENTRAL
$ ✨ Hello World! - CENTRAL

If we delete the echo Deployment running in us-central, Istio will redirect loadgen requests to the echo Pod running in us-east:

$ 🌊 Hello World! - EAST
$ 🌊 Hello World! - EAST
$ 🌊 Hello World! - EAST

We can also add a percentage-based load balancing rule for mesh-wide traffic, in the global Istio installation settings:

    localityLbSetting:
      distribute:
      - from: us-central1/*
        to:
          us-central1/*: 20
          us-east1/*: 80

Now, all services running in both clusters will share requests 80/20, between us-east and us-central. No VirtualServices are needed.

$ 🌊 Hello World! - EAST
$ 🌊 Hello World! - EAST
$ 🌊 Hello World! - EAST
$ 🌊 Hello World! - EAST
$ ✨ Hello World! - CENTRAL

To learn more about Locality-based load balancing with Istio, see the Istio documentation.