Blog Post

Microsoft Developer Community Blog
6 MIN READ

Preventing polynomial memory consumption effect with Open Service Mesh's envoy sidecar

stephaneeyskens's avatar
Apr 07, 2023

Service Meshes have a great value when running distributed applications at scale on K8s. Many meshes are available nowadays. The usual suspects are Istio and Linkerd but other meshes have come to the surface, such as Open Service Mesh (OSM) from Microsoft. OSM is available as an AKS addon. The promises of a service mesh are:

 

  • Increased agility thanks to the built-in support of various deployment/testing models
  • Increased resilience thanks to built-in retry, circuit breakers and fault injections (chaos engineering)
  • Increased observability
  • Increased security, thanks to mTLS and traffic policies
  • Enhanced load balancing algorithms that understand the application layer

All meshes implement to a larger/lesser extent all of the features listed above.

 

These very handy capabilities come at a cost since additional compute capacity must be foreseen to accommodate the needs of the mesh. This is mostly due to the fact that every pod is injected with a sidecar container that implements the ambassador pattern, and each sidecar will have a memory and CPU footprint.

 

This makes sense but you must keep this under control. Before diving into OSM itself, let's first see what happens when a cluster is under memory pressure:

 

 

AKS will start killing low priority pods randomly. Even when cluster/nodepool autoscaling is turned on, under high pressure, memory will be released at the cost of low priority pods. You will likely see unpleasant K8s events such as below:

 

 

K8s gives us tools to control this possible chaos (https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/ , https://kubernetes.io/docs/concepts/workloads/pods/disruptions/ and https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/) but whatever you define, you'll be in trouble if there is no more memory in the cluster.

 

There are multiple reasons why memory could be at risk in a cluster: 

 

  • Not enough worker nodes
  • Memory leaks
  • Memory consumption peaks

Unlike CPU, memory is not compressible...Running out of memory is NOT an option.

 

Now that we have seen the impact of an excessive overall memory consumption, let's see what you must look at when working with OSM from that perspective. At the time of writing, when enabling OSM on a vanilla AKS cluster, the default mesh config spec is as follows:

 

 

 

 

 

spec:
  certificate:
    certKeyBitSize: 2048
    serviceCertValidityDuration: 24h
  featureFlags:
    enableAsyncProxyServiceMapping: false
    enableEgressPolicy: true
    enableEnvoyActiveHealthChecks: false
    enableIngressBackendPolicy: true
    enableRetryPolicy: false
    enableSnapshotCacheMode: false
    enableWASMStats: true
  observability:
    enableDebugServer: true
    osmLogLevel: info
    tracing:
      enable: false
  sidecar:
    configResyncInterval: 0s
    enablePrivilegedInitContainer: false
    localProxyMode: Localhost
    logLevel: debug
    resources: {}
    tlsMaxProtocolVersion: TLSv1_3
    tlsMinProtocolVersion: TLSv1_2
  traffic:
    enableEgress: true
    enablePermissiveTrafficPolicyMode: true
    inboundExternalAuthorization:
      enable: false
      failureModeAllow: false
      statPrefix: inboundExtAuthz
      timeout: 1s
    inboundPortExclusionList: []
    networkInterfaceExclusionList: []
    outboundIPRangeExclusionList: []
    outboundIPRangeInclusionList: []
    outboundPortExclusionList: []

 

 

 

 

 

Some default settings have a big impact on memory, and more particularly enableWASMStats and enablePermissiveTrafficPolicyMode. For very small-scale meshes, you might not really see the difference, but as soon as you inject more pods, with these default settings, you will see an excessive memory consumption. To showcase this, I simply wrote a simple Console program that generates any number of service accounts, deployments and services:
 
using System.Text;

using (StreamWriter sw = new StreamWriter("autogeneratedosm.yaml"))
{
    StringBuilder sb = new StringBuilder();
    for(int i = 0; i < Convert.ToInt32(args[0]);i++)
    {
        sb.Append("apiVersion: v1\r\n");
        sb.Append("kind: ServiceAccount\r\n");
        sb.Append("metadata:\r\n");
        sb.AppendFormat("  name: api{0}\r\n", i);
        sb.Append("---\r\n");

        sb.Append("apiVersion: apps/v1\r\n");
        sb.Append("kind: Deployment\r\n");
        sb.Append("metadata:\r\n");
        sb.AppendFormat("  name: api{0}\r\n", i);
        sb.Append("spec:\r\n");
        sb.Append("  replicas: 1\r\n");
        sb.Append("  selector:\r\n");
        sb.Append("    matchLabels:\r\n");
        sb.AppendFormat("      app: api{0}\r\n",i);
        sb.Append("  template:\r\n");
        sb.Append("    metadata:\r\n");
        sb.Append("      labels:\r\n");
        sb.AppendFormat("        app: api{0}\r\n", i);
        sb.Append("    spec:\r\n");
        sb.AppendFormat("      serviceAccountName: api{0}\r\n",i);
        sb.Append("      containers:\r\n");
        sb.Append("      - name: api\r\n");
        sb.Append("        image: stephaneey/osmapi:dev\r\n");
        sb.Append("        imagePullPolicy: Always\r\n");
        sb.Append("---\r\n");

        sb.Append("apiVersion: v1\r\n");
        sb.Append("kind: Service\r\n");
        sb.Append("metadata:\r\n");
        sb.AppendFormat("  name: apisvc{0}\r\n",i);
        sb.Append("  labels:\r\n");
        sb.AppendFormat("    app: api{0}\r\n",i);
        sb.AppendFormat("    service: apisvc{0}\r\n", i);
        sb.Append("spec:\r\n");
        sb.Append("  ports:\r\n");
        sb.Append("  - port: 80\r\n");
        sb.Append("    name: http\r\n");
        sb.Append("  selector:\r\n");
        sb.AppendFormat("    app: api{0}\r\n",i);
        sb.Append("---\r\n");
    }
    sw.Write(sb.ToString());
}
 
The resulting YAML manifest contains any number of triples: ServiceAccount (used by OSM for mTLS), Deployment (deploying application containers into pods which are injected by OSM) and the Service, which is simply used to let K8s write Linux IP tables. 
 
With the default settings, and more particularly, when enablePermissiveTrafficPolicyMode is set to true, all meshed services can talk to each other without any restriction. Only non-meshed services are denied because they do not present a client certificate that was issued by OSM. While we may think that this setting only impacts security, this also impact memory. Indeed, when using 
enablePermissiveTrafficPolicyMode in conjunction with enableWASMStats, we can see a huge impact on memory. If we produce 60 ServiceAccount/Deployment/Service with our console program, without OSM, we can see that the memory consumption is rather low:
 
 
The .NET API consumes between 17 and 21 Megabytes, which is rather low. Now, running those three commands:
 
kubectl scale deploy --all --replicas=0 -n osmdemo
osm namespace add osmdemo
kubectl scale deploy --all --replicas=1 -n osmdemo

 

We first stop all our APIs, then we ask OSM to monitor the osmdemo namespace, and we eventually restart all of our deployments. Running the following command:

 

kubectl top pod -n osmdemo

 

quickly reveals an excessive memory consumption:

 

 

especially given the initial 17-21 MB consumption. This can quickly lead your cluster to the chaotic situation I described earlier. The memory killer setting is enableWASMStats, which enables live collection of metrics made available by the envoy sidecar. With this setting turned on, an API is available to extract metrics. Turning off this setting only is enough to come back to a "normal" memory consumption:

 

 

However, doing so, you won't have the metrics anymore...so you're losing functionality here. Let's turn enableWASMStats on again and disable enablePermissiveTrafficPolicyMode! With that config, memory consumption remains low and metrics are still collected, but services cannot talk to each other anymore. The only way communication can be authorized is through the use of HTTPRouteGroup and TrafficTarget resource types. When all services can talk to each other, the number of possible routes is huge...while when you explicitly define those routes, you will only define the ones that are really needed, which results in a lower amount of information.

 

Bottom line, if you want to keep memory under control using OSM, here are the combinations:

 

  • enableWASMStats: true enablePermissiveTrafficPolicyMode: true ==> bad
  • enableWASMStats: true enablePermissiveTrafficPolicyMode: false ==> ok
  • enableWASMStats: false enablePermissiveTrafficPolicyMode: false ==> ok
  • enableWASMStats: false enablePermissiveTrafficPolicyMode: true ==> ok

You just need to avoid to set both enableWASMStats and enablePermissiveTrafficPolicyMode to true or else, have huge memory capacity. The winning combination is probably enableWASMStats: true and enablePermissiveTrafficPolicyMode: false because you will keep memory under control while ensuring higher security.

 

If you are unsure about what you plan to do, what you can also do as a precaution measure is to define resource requests and limits for the sidecar. OSM makes it possible through this section:

 

sidecar:    
    resources: {}

 

 However, keep in mind that defining low limits when both enableWASMStats and enablePermissiveTrafficPolicyMode are set to true, will inevitably lead to the killing of meshed-pods, but you will at least preserve non-meshed pods from being evicted by K8s. 

Updated Apr 07, 2023
Version 1.0
  • stephaneeyskens excellent article and very informative. I did have a question about one thing mentioned here.

     

    "The memory killer setting is enableWASMStats, which enables live collection of metrics made available by the envoy sidecar. With this setting turned on, an API is available to extract metrics"

     

    Which metrics are you specifically referring to here? I'm still able to view the following metrics detailed below after disabling WASM Stats as a test.

     

    envoy_cluster_upstream_cx_active, envoy_cluster_upstream_cx_connect_fail, 

    envoy_cluster_upstream_rq_xx