One way to think about a service mesh is as a domain of control. Within a Kubernetes namespace where Istio sidecar injection is enabled, you can monitor all traffic between Pods, and enforce security policies.
But what about upstream services that live outside the mesh? How do you determine at runtime which services call external APIs? How do you know which database instance your service is writing to? Or how do you ensure that a service inside the mesh is only sending traffic within its own geographic region? Istio egress monitoring can help with this.
Egress means exit. In this case, egress traffic means requests that must exit the Istio mesh.
There was a time when Istio blocked all egress traffic by default. You had to manually create a ServiceEntry
to whitelist every external host your services needed to access. A ServiceEntry adds an external host to Istio’s service registry. This changed in Istio 1.3, when the REGISTRY_ONLY
egress policy became ALLOW_ANY
by default. Now, in-mesh services can access external services freely, without the need for a ServiceEntry
.
No matter which Istio egress option you choose for your mesh, Istio can monitor all egress traffic. And you can monitor this egress traffic through your workloads’ sidecar proxies, without needing a dedicated egress gateway proxy. Let’s see how it works.
In this example, we’ve built a website that lets users share recipes. To optimize costs, the web frontend runs outside of Kubernetes as a serverless function. When a user adds a recipe, the frontend creates an ID for that recipe by calling the ID Generator service (idgen
) inside a Kubernetes cluster. idgen
is exposed through the default Istio IngressGateway, and gets random IDs from an external API called httpbin
.
Option 1 - Passthrough
To start, let’s use an Istio installation with the default ALLOW_ANY
option for egress. This means that idgen
’s requests to httpbin
are allowed with no additional configuration. When ALLOW_ANY
is enabled, Istio uses an Envoy cluster called PassthroughCluster
, enforced by idgen
’s sidecar proxy, to monitor the egress traffic.
An Envoy cluster is a backend (or “upstream”) set of endpoints, representing an external service. The Istio sidecar Envoy proxy applies filters to intercepted requests from an application container. Based on these filters, Envoy sends traffic to a specific route. And a route specifies a cluster to send traffic to.
The Istio Passthrough
cluster is set up so that the backend is the original request destination. So when ALLOW_ANY
is enabled for egress traffic, Envoy will simply “pass through” idgen
’s request to httpbin
.
With this configuration, if we send recipe ID requests through the IngressGateway, idgen
can successfully call httpbin
. This traffic appears as PassthroughCluster
traffic in the Kiali service graph - we’ll need to add a ServiceEntry
in order for httpbin to get its own service-level telemetry. (We’ll do this in a moment.)
But if we drill down in Prometheus, and find the istio_total_requests
metric, we can see that PassthroughCluster
traffic is going to a destinationservice
called httpbin.org
.
Option 2 - REGISTRY_ONLY
, no ServiceEntry
Now let’s say that before we add a ServiceEntry
for httpbin, we want to lock down all egress traffic. We can do this by updating the global installation option for outbound traffic to be REGISTRY_ONLY
, and re-applying the Istio install manifests.
Now, a new cluster called BlackHole
comes into play. The Black Hole cluster is a backend without any IP endpoints. Requests routed to the BlackHoleCluster
are dropped by Envoy, and return a 502: Bad Gateway
error. In action, the collection of sidecar proxies dropping egress requests is how the REGISTRY_ONLY
policy is enforced.
Once we re-install Istio with the REGISTRY_ONLY
option enabled, and redeploy our idgen
pod, we can see that the BlackHoleCluster
intercepts the requests. A red graph edge means that HTTP requests do not complete - traffic can’t get to the desired httpbin.org
endpoint.
In Prometheus, we can see that the istio_total_requests
metric is accounting for BlackHoleCluster
traffic. In practice, you might set an alert on this metric to detect attempted data exfiltration by a service in your cluster. In this mode, Prometheus can tell you both the source and (attempted) destination workload for the blocked request.
Option 3 - REGISTRY_ONLY
with ServiceEntry
Now let’s say that we’ve gotten approval for idgen
to call an external API. We’ve authorized the creation of a ServiceEntry
to add httpbin
to the Istio Registry:
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
name: httpbin-ext
spec:
hosts:
- httpbin.org
ports:
- number: 80
name: http
protocol: HTTP
resolution: DNS
location: MESH_EXTERNAL
Now, we can see that requests successfully exit the mesh, and are not dropped by the BlackHoleCluster
:
And note that with a ServiceEntry, Istio treats httpbin as another distinct mesh service, even though it’s external to the Kubernetes cluster and not part of the control domain. Now, we can get telemetry specifically for httpbin
, and if we add another external service, it would appear as its own distinct node in the service graph.
To learn more about monitoring egress traffic: