Service Meshes have a great value when running distributed applications at scale on K8s. Many meshes are available nowadays. The usual suspects are Istio and Linkerd but other meshes have come to the surface, such as Open Service Mesh (OSM) from Microsoft. OSM is available as an AKS addon. The promises of a service mesh are:
- Increased agility thanks to the built-in support of various deployment/testing models
- Increased resilience thanks to built-in retry, circuit breakers and fault injections (chaos engineering)
- Increased observability
- Increased security, thanks to mTLS and traffic policies
- Enhanced load balancing algorithms that understand the application layer
All meshes implement to a larger/lesser extent all of the features listed above.
These very handy capabilities come at a cost since additional compute capacity must be foreseen to accommodate the needs of the mesh. This is mostly due to the fact that every pod is injected with a sidecar container that implements the ambassador pattern, and each sidecar will have a memory and CPU footprint.
This makes sense but you must keep this under control. Before diving into OSM itself, let's first see what happens when a cluster is under memory pressure:
AKS will start killing low priority pods randomly. Even when cluster/nodepool autoscaling is turned on, under high pressure, memory will be released at the cost of low priority pods. You will likely see unpleasant K8s events such as below:
K8s gives us tools to control this possible chaos (https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/ , https://kubernetes.io/docs/concepts/workloads/pods/disruptions/ and https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/) but whatever you define, you'll be in trouble if there is no more memory in the cluster.
There are multiple reasons why memory could be at risk in a cluster:
- Not enough worker nodes
- Memory leaks
- Memory consumption peaks
Unlike CPU, memory is not compressible...Running out of memory is NOT an option.
Now that we have seen the impact of an excessive overall memory consumption, let's see what you must look at when working with OSM from that perspective. At the time of writing, when enabling OSM on a vanilla AKS cluster, the default mesh config spec is as follows:
spec:
certificate:
certKeyBitSize: 2048
serviceCertValidityDuration: 24h
featureFlags:
enableAsyncProxyServiceMapping: false
enableEgressPolicy: true
enableEnvoyActiveHealthChecks: false
enableIngressBackendPolicy: true
enableRetryPolicy: false
enableSnapshotCacheMode: false
enableWASMStats: true
observability:
enableDebugServer: true
osmLogLevel: info
tracing:
enable: false
sidecar:
configResyncInterval: 0s
enablePrivilegedInitContainer: false
localProxyMode: Localhost
logLevel: debug
resources: {}
tlsMaxProtocolVersion: TLSv1_3
tlsMinProtocolVersion: TLSv1_2
traffic:
enableEgress: true
enablePermissiveTrafficPolicyMode: true
inboundExternalAuthorization:
enable: false
failureModeAllow: false
statPrefix: inboundExtAuthz
timeout: 1s
inboundPortExclusionList: []
networkInterfaceExclusionList: []
outboundIPRangeExclusionList: []
outboundIPRangeInclusionList: []
outboundPortExclusionList: []
using System.Text;
using (StreamWriter sw = new StreamWriter("autogeneratedosm.yaml"))
{
StringBuilder sb = new StringBuilder();
for(int i = 0; i < Convert.ToInt32(args[0]);i++)
{
sb.Append("apiVersion: v1\r\n");
sb.Append("kind: ServiceAccount\r\n");
sb.Append("metadata:\r\n");
sb.AppendFormat(" name: api{0}\r\n", i);
sb.Append("---\r\n");
sb.Append("apiVersion: apps/v1\r\n");
sb.Append("kind: Deployment\r\n");
sb.Append("metadata:\r\n");
sb.AppendFormat(" name: api{0}\r\n", i);
sb.Append("spec:\r\n");
sb.Append(" replicas: 1\r\n");
sb.Append(" selector:\r\n");
sb.Append(" matchLabels:\r\n");
sb.AppendFormat(" app: api{0}\r\n",i);
sb.Append(" template:\r\n");
sb.Append(" metadata:\r\n");
sb.Append(" labels:\r\n");
sb.AppendFormat(" app: api{0}\r\n", i);
sb.Append(" spec:\r\n");
sb.AppendFormat(" serviceAccountName: api{0}\r\n",i);
sb.Append(" containers:\r\n");
sb.Append(" - name: api\r\n");
sb.Append(" image: stephaneey/osmapi:dev\r\n");
sb.Append(" imagePullPolicy: Always\r\n");
sb.Append("---\r\n");
sb.Append("apiVersion: v1\r\n");
sb.Append("kind: Service\r\n");
sb.Append("metadata:\r\n");
sb.AppendFormat(" name: apisvc{0}\r\n",i);
sb.Append(" labels:\r\n");
sb.AppendFormat(" app: api{0}\r\n",i);
sb.AppendFormat(" service: apisvc{0}\r\n", i);
sb.Append("spec:\r\n");
sb.Append(" ports:\r\n");
sb.Append(" - port: 80\r\n");
sb.Append(" name: http\r\n");
sb.Append(" selector:\r\n");
sb.AppendFormat(" app: api{0}\r\n",i);
sb.Append("---\r\n");
}
sw.Write(sb.ToString());
}
kubectl scale deploy --all --replicas=0 -n osmdemo
osm namespace add osmdemo
kubectl scale deploy --all --replicas=1 -n osmdemo
We first stop all our APIs, then we ask OSM to monitor the osmdemo namespace, and we eventually restart all of our deployments. Running the following command:
kubectl top pod -n osmdemo
quickly reveals an excessive memory consumption:
especially given the initial 17-21 MB consumption. This can quickly lead your cluster to the chaotic situation I described earlier. The memory killer setting is enableWASMStats, which enables live collection of metrics made available by the envoy sidecar. With this setting turned on, an API is available to extract metrics. Turning off this setting only is enough to come back to a "normal" memory consumption:
However, doing so, you won't have the metrics anymore...so you're losing functionality here. Let's turn enableWASMStats on again and disable enablePermissiveTrafficPolicyMode! With that config, memory consumption remains low and metrics are still collected, but services cannot talk to each other anymore. The only way communication can be authorized is through the use of HTTPRouteGroup and TrafficTarget resource types. When all services can talk to each other, the number of possible routes is huge...while when you explicitly define those routes, you will only define the ones that are really needed, which results in a lower amount of information.
Bottom line, if you want to keep memory under control using OSM, here are the combinations:
- enableWASMStats: true enablePermissiveTrafficPolicyMode: true ==> bad
- enableWASMStats: true enablePermissiveTrafficPolicyMode: false ==> ok
- enableWASMStats: false enablePermissiveTrafficPolicyMode: false ==> ok
- enableWASMStats: false enablePermissiveTrafficPolicyMode: true ==> ok
You just need to avoid to set both enableWASMStats and enablePermissiveTrafficPolicyMode to true or else, have huge memory capacity. The winning combination is probably enableWASMStats: true and enablePermissiveTrafficPolicyMode: false because you will keep memory under control while ensuring higher security.
If you are unsure about what you plan to do, what you can also do as a precaution measure is to define resource requests and limits for the sidecar. OSM makes it possible through this section:
sidecar:
resources: {}
However, keep in mind that defining low limits when both enableWASMStats and enablePermissiveTrafficPolicyMode are set to true, will inevitably lead to the killing of meshed-pods, but you will at least preserve non-meshed pods from being evicted by K8s.