**Thank you to the Microsoft Sentinel CxE team, Jeff Wolford, and @Preeti_Krishna for the assistance with this document.**
This blog will provide a high-level overview of potential architecture designs that can be used to achieve a high availability, scalable ingestion pipeline. The main components that will be covered in the designs will be:
Data Sources (Coming from endpoints)
The architectures can be categorized into 3 main scenarios:
Azure based: Components for collection reside within the Azure platform.
Hybrid: One component resides outside of Azure.
Non-Azure: All components reside outside of Azure.
If using an Azure hosted VMSS, these devices will replace individual forwarders. The scale set will leverage the AMA extension specifically for VMSS. The network rules for the VMSS will need to be configured to allow traffic to come in via the required ports for forwarding. Scaling out a VMSS can be done automatically based on resource consumption: CPU, disk space, and memory consumption. The load balancer will use either round robin or least connection to distribute traffic to each of the active nodes.
Pros: Single node/image can be used across each node. Scaling out of forwarders can be automated and only one resource needs to be provisioned/managed. Traffic is encrypted within the portal. Additionally, there are cost savings via this method.
Cons: Users will need to be familiar with monitoring best practices for VMSS in order to properly configure scaling out of nodes.
Load Balancer and Forwarders in Azure - OPTIONAL
This architecture involves hosting both the load balancer and individual forwarder machines within the Azure portal. This will have the log sources be configured to send traffic to the load balancer that is being hosted in Azure. From there, the load balancer can use round robin or least connections to distribute the ingestion volume to the log forwarders that are also being hosted in Azure.
Pros: Infrastructure and networking are managed within Azure. Traffic is encrypted once it is within the portal. Lower capital cost to spin up new forwarder machines vs. hosting on-prem.
Cons: Scaling out the forwarding infrastructure will not be as efficient in comparison to scaling out with a VMSS. Hosting several individual forwarder machines is more expensive than using a VMSS. Traffic between sources and the load balancer is not encrypted.
Load Balancer Outside Azure/Forwarders in Azure
If hosting the load balancer outside of Azure, the data sources will need to point to the load balancer and have local firewalls allow traffic over the proper ports. The same will need to be done for the forwarders hosted within Azure. Traffic to/from the load balancer will need to be encrypted.
Pros: Load balancer is not tied to a cloud platform if this is a concern. Cost is fixed vs. consumption based.
Cons: Load balancer will require additional configuration to encrypt data that is outbound to Azure. Potential hardware will need to be purchased/installed/maintained. Capital expense becomes a factor.
Load Balancer and Forwarders Outside Azure
If both the load balancer and forwarders are hosted outside of Azure, they will need to be configured for inbound/outbound traffic on the local firewall.
Pros: Main components of the architecture are hosted in-house. Costs associated are fixed and more predictable.
Cons: Full responsibility for hardware and operational costs for architecture. More setup requirements for components to operate properly. Hardware and operational costs become a factor. Scaling out forwarder machines will take more time.
Items to Consider:
The load balancer can be configured to leverage either round robin or least connection.
Pros: Ensures distribution of traffic amongst the forwarder devices as events reach the load balancer. Protocol is lighter on resource consumption for the load balancer.
Cons: Lack of full control over distribution. Algorithm is simplistic so overload of forwarders is possible.
Pros: Applies tracking of workloads for each endpoint, allowing for smarter traffic management than round robin. Logic helps avoid forwarder overload.
Cons: More resource intensive on the load balancer due to smarter distribution. Potentially more intensive on forwarder machines due to sending connection details to load balancer.
When looking to make a decision, least connection should be used if:
You are looking to avoid overloading forwarder machines in the event that they are processing other connections when new traffic arrives.
You anticipate forwarder machines needing to be scaled out consistently.
Least connection will allow for new nodes to be spun up and immediately begin to handle all of the new connections coming in while existing nodes handle the existing traffic. If this is not a problem, round robin can be used.
The AMA in comparison to the MMA can handle more events per second(EPS). Today, the AMA can handle 10,000 EPS per forwarder.