planning and architecture
52 TopicsExchange 2013 Client Access Server Role
In a previous article, I discussed the new server role architecture in Exchange 2013. This article continues the series by discussing the Client Access server role. While this Exchange server role shares the same name as a server role that existed in the last two Exchange Server releases, it is markedly different. In Exchange 2007, the Client Access server role provided authentication, proxy/redirection logic, and performed data rendering for the Internet protocol clients (Outlook Web App, EAS , EWS , IMAP and POP ). In Exchange 2010, data rendering for MAPI was also moved to the Client Access server role. In Exchange 2013, the Client Access server (CAS) role no longer performs any data rendering functionality. The Client Access server role now only provides authentication and proxy/redirection logic, supporting the client Internet protocols, transport, and Unified Messaging. As a result of this architectural shift, the CAS role is stateless (from a protocol session perspective, log data that can be used in troubleshooting or trending analysis is generated, naturally). Session Affinity As I alluded to in the sever role architecture blog post, Exchange 2013 no longer requires session affinity at the load balancer. To understand this better, we need to look at how CAS 2013 functions. From a protocol perspective, the following will happen: A client resolves the namespace to a load balanced virtual IP address. The load balancer assigns the session to a CAS member in the load balanced pool. CAS authenticates the request and performs a service discovery by accessing Active Directory to retrieve the following information: Mailbox version (for this discussion, we will assume an Exchange 2013 mailbox) Mailbox location information (e.g., database information, ExternalURL values, etc.) CAS makes a decision on whether to proxy the request or redirect the request to another CAS infrastructure (within the same forest). CAS queries an Active Manager instance that is responsible for the database to determine which Mailbox server is hosting the active copy. CAS proxies the request to the Mailbox server hosting the active copy. The protocol used in step 6 depends on the protocol used to connect to CAS . If the client leverages the HTTP protocol, then the protocol used between the Client Access server and Mailbox server is HTTP (secured via SSL using a self-signed certificate). If the protocol leveraged by the client is IMAP or POP, then the protocol used between the Client Access server and Mailbox server is IMAP or POP. Telephony requests are unique, however. Instead of proxying the request at step 6, CAS will redirect the request to the Mailbox server hosting the active copy of the user’s database, as the telephony devices support redirection and need to establish their SIP and RTP sessions directly with the Unified Messaging components on the Mailbox server. Figure 1: Exchange 2013 Client Access Protocol Architecture In addition to no longer performing data rendering, step 5 is the fundamental change that enables the removal of session affinity at the load balancer. For a given protocol session, CAS now maintains a 1:1 relationship with the Mailbox server hosting the user’s data. In the event that the active database copy is moved to a different Mailbox server, CAS closes the sessions to the previous server and establishes sessions to the new server. This means that all sessions, regardless of their origination point (i.e., CAS members in the load balanced array), end up at the same place, the Mailbox server hosting the active database copy. Now many of you may be thinking, wait how does authentication work? Well for HTTP, POP, or IMAP requests that use basic, NTLM, or Kerberos authentication, the authentication request is passed as part of the HTTP payload, so each CAS will authenticate the request naturally. Forms-based authentication (FBA) is different. FBA was one of the reasons why session affinity was required for OWA in previous releases of Exchange – the reason being that that the cookie used a per server key for encryption; so if another CAS received a request, it could not decrypt the session. In Exchange 2013, we no longer leverage a per server session key; instead we leverage the private key from the certificate that is installed on the CAS. As long as all members of the CAS array share the exact same certificate (remember we actually recommend deploying the same certificate across all CAS in both datacenters in site resilience scenarios as well), they can decrypt the cookie. Proxy vs. Redirection In the previous section, I spoke about CAS proxying the data to the Mailbox server hosting the active database copy. Prior to that, CAS has to make a decision whether it will perform the proxy action or perform a redirection action. CAS will only perform a redirection action under the following circumstances: The origination request is telephony related. For Outlook Web App requests, if the mailbox’s location is determined to be in another Active Directory site and there are CAS2013 members in that site that have the ExternalURL populated, then the originating CAS will redirect the request unless the ExternalURL in the target site is the same as in the originating site (and the ExternalURL is the URL entered in the web browser by the user) – in which case CAS will proxy (this is the multiple site single namespace scenario). Figure 2: Exchange 2013 Client Access Proxy and Redirection Behavior Examples For OWA requests, if the mailbox version is Exchange 2007, then CAS2013 will redirect the request to CAS2007. Outlook Connectivity For those of you paying attention, you may have noticed I only spoke about HTTP, POP, and IMAP. I didn’t mention RPC/TCP as connectivity solution that CAS supports. And that is for a very specific reason – CAS2013 does not support RPC/TCP as a connectivity solution; it only supports RPC/HTTP (aka Outlook Anywhere). This architecture change is primarily to drive a stable and reliable connectivity model. To understand why, you need to keep the following tenets in the back of your mind: Remember CAS2013 is an authentication and proxy/redirection server. It does no processing of the data (no rendering or transformation). It simply proxies the request to MBX2013 using the client protocol. In this case HTTP. CAS2013 and MBX2013 are not tied together from a user affinity or geographical perspective. You can have CAS2013 in one datacenter authenticate the request and proxy the request to a MBX2013 server in another datacenter. To enable this we had to change the communication protocols used between server roles. Moving away from RPC to the client protocols that are more tolerant of throughput & latency over WAN /Internet connections. For a given mailbox, the protocol that services the request is always going to be the protocol instance on the Mailbox server that hosts the active copy of the database for the user’s mailbox. This was done to ultimately uncouple versioning and functionality issues we’ve seen in the past two generations (i.e., have to deploy CAS2010, HT2010, MBX2010 together to get certain functionality and upgrading one doesn’t necessarily give you new capabilities and may break connectivity). The last item is tied to this discussion of why we have moved away from RPC/TCP as a connectivity solution. In all prior releases the RPC endpoint was a FQDN . In fact, the shift to the middle tier for RPC processing in CAS2010 introduced a new shared namespace, the RPC Client Access namespace. By moving RPC Client Access back to the MBX2013 role, this would have forced us to use either the MBX2013 FQDN for the RPC endpoint (thus forcing an Outlook client restart for every database *over event) or a shared namespace for the DAG . Neither option is appropriate and adds to the complexity and support of the infrastructure. So instead, we changed the model. We no longer use a FQDN for the RPC endpoint. Instead we now use a GUID. The mailbox GUID, to be precise (along with a domain suffix to support multi-tenant scenarios). The mailbox GUID is unique within the (tenant) organization, so regardless of where the database is activated and mounted, CAS can discover the location and proxy the request to the correct MBX2013 server. Figure 3: RPC Endpoint Changes This architectural change means that we have a very reliable connection model – for a given session that is routed to CAS2013, CAS2013 will always have a 1:1 relationship with the MBX2013 server hosting the user’s mailbox. This means that the Mailbox server hosting the active copy of the user’s database is the server responsible for de-encapsulating the RPC stream from the HTTP packets. In the event a *over occurs, CAS2013 will proxy the connection to MBX2013 that assumes the responsibility of hosting the active database copy. Oh, and this means in a native Exchange 2013 environment, Outlook won’t require a restart for things like mailbox moves, *over events, etc. The other architectural change we made in this area is the support for internal and external namespaces for Outlook Anywhere. This means you may not need to deploy split-brain DNS or deal with all Outlook clients using your external firewall or load balancer due to our change in MAPI connectivity. Third-Party MAPI Products I am sure that a few of you are wondering what this change means for third-party MAPI products. The answer is relatively simple – these third-party solutions will need to leverage RPC/HTTP to connect to CAS2013. This will be accomplished via a new MAPI/CDO download that that has been updated to include support for RPC/HTTP connectivity. It will be released in the first quarter of calendar year 2013. To leverage this updated functionality, the third-party vendor will either have to programmatically edit the dynamic MAPI profile or set registry key values to enable RPC/HTTP support. I do also want to stress one key item with respect to third-party MAPI support. Exchange 2013 is the last release that will support a MAPI/CDO custom solution. In the future, third-party products (and custom in-house developed solutions) will need to move to Exchange Web Services (EWS) to access Exchange data. Namespace Simplification Another benefit with the Exchange 2013 architecture is that the namespace model can be simplified (especially for those of you upgrading from Exchange 2010). In Exchange 2010, a customer that wanted to deploy a site-resilient solution for two datacenters required the following namespaces: Primary datacenter Internet protocol namespace Secondary datacenter Internet protocol namespace Primary datacenter Outlook Web App failback namespace Secondary datacenter Outlook Web App failback namespace Primary datacenter RPC Client Access namespace Secondary datacenter RPC Client Access namespace Autodiscover namespace Legacy namespace Transport namespace (if doing ad-hoc encryption or partner-to-partner encryption) As I previously mentioned, we have removed two of these namespaces in Exchange 2013 – the RPC Client Access namespaces. Recall that CAS2013 proxies requests to the Mailbox server hosting the active database copy. This proxy logic is not limited to the Active Directory site boundary. A CAS2013 in one Active Directory site can proxy a session to a Mailbox server that is located in another Active Directory site. If network utilization, latency, and throughput are not a concern, this means that we do not need the additional namespaces for site resilience scenarios, thereby eliminating three other namespaces (secondary Internet protocol and both Outlook Web App failback namespaces). For example, let’s say I have a two datacenter deployment in North America that has a network configuration such that latency, throughput and utilization between the datacenters is not a concern. I also wanted to simplify my namespace architecture with the Exchange 2013 deployment so that my users only have to use a single namespace for Internet access regardless of where their mailbox is located. If I deployed an architecture like below, then the CAS infrastructure in both datacenters could be used to route and proxy traffic to the Mailbox servers hosting the active copies. Since I am not concerned about network traffic, I configure DNS to round-robin between the VIP s of the load balancers in each datacenter. The end result is that I have a site resilient namespace architecture while accepting that half of my proxy traffic will be out-of-site. Figure 4: Exchange 2013 Single Namespace Example Transport Early on I mentioned that the Client Access Server role can proxy SMTP sessions. This is handled by a new component on the CAS2013 role, the Front-End Transport service. The Front-End Transport service handles all inbound and outbound external SMTP traffic for the Exchange organization, as well as, can be a client endpoint for SMTP traffic. The Front-End Transport Service functions as a layer 7 proxy and has full access to the protocol conversation. Like the client Internet protocols, the Front-End Transport service does not have a message queue and is completely stateless. In addition, the Front-End Transport service does not perform message bifurcation. The Front-End Transport service listens on TCP25, TCP587, and TCP717 as seen in the following diagram: Figure 5: Front-End Transport Service Architecture The Front-End Transport service provides network protection – a centralized, load balanced egress/ingress point for the Exchange organization, whether it be POP/IMAP clients, SharePoint, other third-party or custom in-house mail applications, or external SMTP systems. For outgoing messages, the Front-End Transport service is used as a proxy when the Send Connectors (that are located on the Mailbox server) have the FrontEndProxyEnabled property set. In this situation, the message will appear to have originated from CAS2013. For incoming messages, the Front-End Transport service must quickly find a single, healthy Transport service on a Mailbox server to receive the message transmission, regardless of the number or type of recipients: For messages with a single mailbox recipient, select a Mailbox server in the target delivery group, and give preference to the Mailbox server based on the proximity of the Active Directory site. For messages with multiple mailbox recipients, use the first 20 recipients to select a Mailbox server in the closest delivery group, based on the proximity of the Active Directory site. If the message has no mailbox recipients, select a random Mailbox server in the local Active Directory site. Conclusion The Exchange 2013 Client Access Server role simplifies the network layer. Session affinity at the load balancer is no longer required as CAS2013 handles the affinity aspects. CAS2013 introduces more deployment flexibility by allowing you to simplify your namespace architecture, potentially consolidating to a single world-wide or regional namespace for your Internet protocols. The new architecture also simplifies the upgrade and inter-operability story as CAS2013 can proxy or redirect to multiple versions of Exchange, whether they are a higher or lower version, allowing you to upgrade your Mailbox servers at your own pace. Ross Smith IV Principal Program Manager Exchange Customer Experience257KViews0likes18CommentsExchange 2016 Coexistence with Kerberos Authentication
With the release of Exchange Server 2016, I thought it would be best to document our guidance around utilizing Kerberos authentication for MAPI clients. Like with the last two releases, the solution leverages deploying an Alternate Service Account (ASA) credential so that domain-joined and domain-connected Outlook clients, as well as other MAPI clients, can utilize Kerberos authentication. Depending on your environment, you may utilize a single ASA or have multiple ASA accounts during the coexistence period. Exchange 2016 Coexistence with Exchange 2010 Two ASA credentials will be utilized in this environment. One ASA credential will be assigned to Exchange 2010 and host the exchangeMDB, ExchangeRFR, and ExchangeAB SPNs, while a second ASA credential will be assigned to Exchange 2016 and host the http SPN records. For more information, see the Exchange 2013 and Exchange 2010 Coexistence with Kerberos Authentication article. Exchange 2016 Coexistence with Exchange 2013 A single ASA credential will be utilized and configured on all Exchange 2013 and Exchange 2016 servers. For more information, see the Exchange 2013 Configuring Kerberos authentication for load-balanced Client Access servers article. Note: The RollAlternateserviceAccountCredential.ps1 script included in Exchange 2016 scripts directory utilizes the new cmdlets, Get/Set-ClientAccessService. This cmdlet will not execute correctly on Exchange 2013 servers. Copy the RollAlternateserviceAccountCredential.ps1 script included in Exchange 2013 CU10 scripts directory to an Exchange 2016 server. Execute the copied script in order to deploy the ASA across Exchange servers. Exchange 2016 Coexistence with both Exchange 2010 and Exchange 2013 Two ASA credentials will be utilized in this environment. One ASA credential will be assigned to Exchange 2010 and host the exchangeMDB, ExchangeRFR, and ExchangeAB SPNs, while a second ASA credential will be assigned to the Exchange 2013 and Exchange 2016 servers to host the http SPN records. For more information, see the Exchange 2013 and Exchange 2010 Coexistence with Kerberos Authentication article. Ross Smith IV Principal Program Manager Office 365 Customer Experience53KViews0likes4CommentsThe Exchange 2016 Preferred Architecture
The Preferred Architecture (PA) is the Exchange Engineering Team’s best practice recommendation for what we believe is the optimum deployment architecture for Exchange 2016, and one that is very similar to what we deploy in Office 365. While Exchange 2016 offers a wide variety of architectural choices for on-premises deployments, the architecture discussed below is our most scrutinized one ever. While there are other supported deployment architectures, they are not recommended. The PA is designed with several business requirements in mind, such as the requirement that the architecture be able to: Include both high availability within the datacenter, and site resilience between datacenters Support multiple copies of each database, thereby allowing for quick activation Reduce the cost of the messaging infrastructure Increase availability by optimizing around failure domains and reducing complexity The specific prescriptive nature of the PA means of course that not every customer will be able to deploy it (for example, customers without multiple datacenters). And some of our customers have different business requirements or other needs which necessitate a different architecture. If you fall into those categories, and you want to deploy Exchange on-premises, there are still advantages to adhering as closely as possible to the PA, and deviate only where your requirements widely differ. Alternatively, you can consider Office 365 where you can take advantage of the PA without having to deploy or manage servers. The PA removes complexity and redundancy where necessary to drive the architecture to a predictable recovery model: when a failure occurs, another copy of the affected database is activated. The PA is divided into four areas of focus: Namespace design Datacenter design Server design DAG design Namespace Design In the Namespace Planning and Load Balancing Principles articles, I outlined the various configuration choices that are available with Exchange 2016. For the namespace, the choices are to either deploy a bound namespace (having a preference for the users to operate out of a specific datacenter) or an unbound namespace (having the users connect to any datacenter without preference). The recommended approach is to utilize the unbounded model, deploying a single Exchange namespace per client protocol for the site resilient datacenter pair (where each datacenter is assumed to represent its own Active Directory site - see more details on that below). For example: autodiscover.contoso.com For HTTP clients: mail.contoso.com For IMAP clients: imap.contoso.com For SMTP clients: smtp.contoso.com Each Exchange namespace is load balanced across both datacenters in a layer 7 configuration that does not leverage session affinity, resulting in fifty percent of traffic being proxied between datacenters. Traffic is equally distributed across the datacenters in the site resilient pair, via round robin DNS, geo-DNS, or other similar solutions. From our perspective, the simpler solution is the least complex and easier to manage, so our recommendation is to leverage round robin DNS. For the Office Online Server farm, a namespace is deployed per datacenter, with the load balancer utilizing layer 7, maintaining session affinity using cookie based persistence. Figure 1: Namespace Design in the Preferred Architecture In the event that you have multiple site resilient datacenter pairs in your environment, you will need to decide if you want to have a single worldwide namespace, or if you want to control the traffic to each specific datacenter by using regional namespaces. Ultimately your decision depends on your network topology and the associated cost with using an unbound model; for example, if you have datacenters located in North America and Europe, the network link between these regions might not only be costly, but it might also have high latency, which can introduce user pain and operational issues. In that case, it makes sense to deploy a bound model with a separate namespace for each region. However, options like geographical DNS offer you the ability to deploy a single unified namespace, even when you have costly network links; geo-DNS allows you to have your users directed to the closest datacenter based on their client’s IP address. Figure 2: Geo-distributed Unbound Namespace Site Resilient Datacenter Pair Design To achieve a highly available and site resilient architecture, you must have two or more datacenters that are well-connected (ideally, you want a low round-trip network latency, otherwise replication and the client experience are adversely affected). In addition, the datacenters should be connected via redundant network paths supplied by different operating carriers. While we support stretching an Active Directory site across multiple datacenters, for the PA we recommend that each datacenter be its own Active Directory site. There are two reasons: Transport site resilience via Shadow Redundancy and Safety Net can only be achieved when the DAG has members located in more than one Active Directory site. Active Directory has published guidance that states that subnets should be placed in different Active Directory sites when the round trip latency is greater than 10ms between the subnets. Server Design In the PA, all servers are physical servers. Physical hardware is deployed rather than virtualized hardware for two reasons: The servers are scaled to use 80% of resources during the worst-failure mode. Virtualization adds an additional layer of management and complexity, which introduces additional recovery modes that do not add value, particularly since Exchange provides that functionality. Commodity server platforms are used in the PA. Commodity platforms are and include: 2U, dual socket servers (20-24 cores) up to 192GB of memory a battery-backed write cache controller 12 or more large form factor drive bays within the server chassis Additional drive bays can be deployed per-server depending on the number of mailboxes, mailbox size, and the server’s scalability. Each server houses a single RAID1 disk pair for the operating system, Exchange binaries, protocol/client logs, and transport database. The rest of the storage is configured as JBOD, using large capacity 7.2K RPM serially attached SCSI (SAS) disks (while SATA disks are also available, the SAS equivalent provides better IO and a lower annualized failure rate). Each disk that houses an Exchange database is formatted with ReFS (with the integrity feature disabled) and the DAG is configured such that AutoReseed formats the disks with ReFS: Set-DatabaseAvailabilityGroup <DAG> -FileSystem ReFS BitLocker is used to encrypt each disk, thereby providing data encryption at rest and mitigating concerns around data theft or disk replacement. For more information, see Enabling BitLocker on Exchange Servers. To ensure that the capacity and IO of each disk is used as efficiently as possible, four database copies are deployed per-disk. The normal run-time copy layout ensures that there is no more than a single active copy per disk. At least one disk in the disk pool is reserved as a hot spare. AutoReseed is enabled and quickly restores database redundancy after a disk failure by activating the hot spare and initiating database copy reseeds. Database Availability Group Design Within each site resilient datacenter pair you will have one or more DAGs. DAG Configuration As with the namespace model, each DAG within the site resilient datacenter pair operates in an unbound model with active copies distributed equally across all servers in the DAG. This model: Ensures that each DAG member’s full stack of services (client connectivity, replication pipeline, transport, etc.) is being validated during normal operations. Distributes the load across as many servers as possible during a failure scenario, thereby only incrementally increasing resource use across the remaining members within the DAG. Each datacenter is symmetrical, with an equal number of DAG members in each datacenter. This means that each DAG has an even number of servers and uses a witness server for quorum maintenance. The DAG is the fundamental building block in Exchange 2016. With respect to DAG size, a larger DAG provides more redundancy and resources. Within the PA, the goal is to deploy larger DAGs (typically starting out with an eight member DAG and increasing the number of servers as required to meet your requirements). You should only create new DAGs when scalability introduces concerns over the existing database copy layout. DAG Network Design The PA leverages a single, non-teamed network interface for both client connectivity and data replication. A single network interface is all that is needed because ultimately our goal is to achieve a standard recovery model regardless of the failure - whether a server failure occurs or a network failure occurs, the result is the same: a database copy is activated on another server within the DAG. This architectural change simplifies the network stack, and obviates the need to manually eliminate heartbeat cross-talk. Note: While your environment may not use IPv6, IPv6 remains enabled per IPv6 support in Exchange. Witness Server Placement Ultimately, the placement of the witness server determines whether the architecture can provide automatic datacenter failover capabilities or whether it will require a manual activation to enable service in the event of a site failure. If your organization has a third location with a network infrastructure that is isolated from network failures that affect the site resilient datacenter pair in which the DAG is deployed, then the recommendation is to deploy the DAG’s witness server in that third location. This configuration gives the DAG the ability to automatically failover databases to the other datacenter in response to a datacenter-level failure event, regardless of which datacenter has the outage. If your organization does not have a third location, consider placing the witness in Azure; alternatively, place the witness server in one of the datacenters within the site resilient datacenter pair. If you have multiple DAGs within the site resilient datacenter pair, then place the witness server for all DAGs in the same datacenter (typically the datacenter where the majority of the users are physically located). Also, make sure the Primary Active Manager (PAM) for each DAG is also located in the same datacenter. Data Resiliency Data resiliency is achieved by deploying multiple database copies. In the PA, database copies are distributed across the site resilient datacenter pair, thereby ensuring that mailbox data is protected from software, hardware and even datacenter failures. Each database has four copies, with two copies in each datacenter, which means at a minimum, the PA requires four servers. Out of these four copies, three of them are configured as highly available. The fourth copy (the copy with the highest Activation Preference number) is configured as a lagged database copy. Due to the server design, each copy of a database is isolated from its other copies, thereby reducing failure domains and increasing the overall availability of the solution as discussed in DAG: Beyond the “A”. The purpose of the lagged database copy is to provide a recovery mechanism for the rare event of system-wide, catastrophic logical corruption. It is not intended for individual mailbox recovery or mailbox item recovery. The lagged database copy is configured with a seven day ReplayLagTime. In addition, the Replay Lag Manager is also enabled to provide dynamic log file play down for lagged copies when availability is compromised. By using the lagged database copy in this manner, it is important to understand that the lagged database copy is not a guaranteed point-in-time backup. The lagged database copy will have an availability threshold, typically around 90%, due to periods where the disk containing a lagged copy is lost due to disk failure, the lagged copy becoming an HA copy (due to automatic play down), as well as, the periods where the lagged database copy is re-building the replay queue. To protect against accidental (or malicious) item deletion, Single Item Recovery or In-Place Hold technologies are used, and the Deleted Item Retention window is set to a value that meets or exceeds any defined item-level recovery SLA. With all of these technologies in play, traditional backups are unnecessary; as a result, the PA leverages Exchange Native Data Protection. Office Online Server Design At a minimum, you will want to deploy two Office Online Servers in each datacenter that hosts Exchange 2016 servers. Each Office Online Server should have 8 processor cores, 32GB of memory and at least 40GB of space dedicated for log files. Note: The Office Online Server infrastructure does not need to be exclusive to Exchange. As such, the hardware guidance takes into account usage by SharePoint and Skype for Business. Be sure to work with any other teams using the Office Online Server infrastructure to ensure the servers are adequately sized for your specific deployment. The Exchange servers within a particular datacenter are configured to use the local Office Online Server farm via the following cmdlet: Set-MailboxServer <East MBX Server> –WACDiscoveryEndPoint https://oos-east.contoso.com/hosting/discovery Summary Exchange Server 2016 continues in the investments introduced in previous versions of Exchange by reducing the server role architecture complexity, aligning with the Preferred Architecture and Office 365 design principles, and improving coexistence with Exchange Server 2013. These changes simplify your Exchange deployment, without decreasing the availability or the resiliency of the deployment. And in some scenarios, when compared to previous generations, the PA increases availability and resiliency of your deployment. Ross Smith IV Principal Program Manager Office 365 Customer Experience265KViews1like20CommentsAsk the Perf Guy: Sizing Exchange 2016 Deployments
Uh oh. You are probably thinking to yourself, here comes another one of those incredibly long blog posts about sizing. Thankfully, it’s not. If you want to fully understand the sizing process for Exchange 2016, you are certainly welcome to read the previous post that I did for Exchange 2013, as the overall process is effectively the same with one major caveat. Since we have eliminated the CAS role in Exchange 2016, you must follow the process for multi-role deployment sizing. Overall, the inputs to our sizing formulas stay the same from Exchange 2013 to Exchange 2016. This means that our IOPS requirements, memory requirements, and all of the other values provided in the Exchange 2013 sizing guidance should continue to be used for Exchange 2016. We are changing one set of inputs, however. Processor requirements We are slightly increasing the processor requirements for Exchange 2016 (compared to Exchange 2013) as this is a very new release, and we are still learning how it performs in production. This slight increase in CPU provides some additional headroom for unanticipated issues, and may be changed in the future as we learn more from our internal deployments as well as customer feedback. The same SPECint_rate2006 baseline value described in the Exchange 2013 guidance should continue to be used (33.75 per-core). Messages sent or received per mailbox per day Mcycles per User, Active DB Copy or Standalone Mcycles per User, Passive DB Copy 50 2.99 0.70 100 5.97 1.40 150 8.96 2.10 200 11.94 2.80 250 14.93 3.50 300 17.91 4.20 350 20.90 4.90 400 23.88 5.60 450 26.87 6.30 500 29.85 7.00 These changes are reflected in v7.8 and later in the calculator. System scalability The previously released guidance on maximum recommended cores and maximum memory size for Exchange 2013 is generally applicable to Exchange 2016 in terms of background and general scalability issues, however we have increased the recommended maximum memory size for currently supported versions of Exchange 2016 to 192GB. We recommend not exceeding the following sizing characteristics for Exchange 2016 servers. Recommended Maximum Processor Core Count 24 Recommended Maximum Memory 192 GB Note: Version 9.1 and later of the Exchange Server Role Requirements Calculator aligns with this guidance. Summary If you are at all familiar with the process for sizing the last couple of Exchange releases, you will be very comfortable with Exchange 2016. As with any new release, you should plan to roll-out in stages and monitor the health and performance of the solution carefully. We do expect that our performance and scalability guidance for Exchange 2016 will evolve over the lifespan of the product so watch the Exchange team blog for future updates. Jeff Mealiffe Principal PM Manager Office 365 Customer Experience218KViews0likes7CommentsExchange 2013 Calculator Updates
Today, we released an updated version of the Exchange 2013 Server Role Requirements Calculator. In addition to numerous bug fixes, this version includes new functionality: CPU utilization table, ReplayLagManager support, MaximumPreferredActiveDatabases support, Restore-DatabaseAvailabilityGroup scenario support, and guidance on sizing recommendations. You can view what changes have been made or downloadthe update directly. For details on the new features, read on. CPU Utilization Table The Role Requirements tab includes a table that outlines the expected theoretical CPU utilization for various modes: Normal Run Time (where the active copies are distributed according to ActivationPreference=1) Single Server Failure (redistribution of active copies based on a single server failure event) Double Server Failure (redistribution of active copies based on a double server failure event) Site Failure (datacenter activation) Worst Failure Mode (in some cases, this value will equal one of the previous scenarios, it could also be a scenario like Site Failure + 1 server failure; the worst failure mode is what is used to calculate memory and CPU requirements) Here’s an example: In the above scenario, the worst failure mode is a site failure + 1 additional server failure (since this is a 4 database copy architecture). ReplayLagManager Support ReplayLagManager is a new feature in Exchange Server 2013 that automatically plays down the lagged database copy when availability is compromised. While it is disabled by default, we recommend it be enabled as part of the Preferred Architecture. Prior to version 7.5, the calculator only supported ReplayLagManagerin the scripts created via the Distribution tab (the Role Requirements and Activation Scenarios tabs did not support it). As a result, the calculator did not factor the lagged database copy as a viable activation target for the worst failure mode. Naturally, this is an issue because sizing is based on the number of active copies and the more copies activated on a server, the greater the impact to CPU and memory requirements. In a 4-copy 2+2 site resilient design, with the fourth copy being lagged, what this meant in terms of failure modes, is that the calculator sized the environment based on what it considered the worst case failure mode – Site Failure (2 HA copies lost, only a single HA copy remaining). Using the CPU table above as an example, calculator versions prior to 7.5 would base the design requirements on 18 active database copies (site failure) instead of 22 active database copies (3 copies lost, lagged copy played down and being utilized as the remaining active). ReplayLagManageris only supported (from the calculator perspective) when the design leverages: Multiple Databases / Volume 3+ HA copies MaximumPreferredActiveDatabases Support Exchange 2010 introduced the MaximumActiveDatabasesparameter which defines the maximum number of databases that are allowed to be activated on a server by BCS. It is this value that is used in sizing a Mailbox server (and is defined the worst failure mode in the calculator). Exchange 2013 introduced an additional parameter, MaximumPreferredActiveDatabases. This parameter specifies a preferred maximum number of databases that the Mailbox server should have. The value of MaximumPreferredActiveDatabasesis only honored during best copy and server selection (phases 1 through 4), database and server switchovers, and when rebalancing the DAG. With version 7.5 or later, the calculator recommends setting MaximumPreferredActiveDatabases when there are four or more total database copies. Also, the Export DAG List form exposes the MaximumPreferredActiveDatabasessetting and createdag.ps1 sets the value for the parameter. Restore-DatabaseAvailabilityGroup Scenario Support In prior releases, the Distribution tab only supported the concept of Fail WAN, which allowed you to simulate the effects of a WAN failure and model the surviving datacenter’s reaction depending on the location of the Witness server. However, Fail WAN did not attempt to shrink the quorum, so if you attempted to fail an additional server you would end up in this condition: With this version 7.5 and later, the calculator adds a new mode: Fail Site. When Fail Site is used, the datacenter switchover steps are performed (and thus the quorum is shrunk, alternate witness is utilized, if required, etc.) thereby allowing you to fail additional servers. This allows you to simulate the worst failure mode that is identified in the Role Requirements and Activation Scenarios tabs. Note: In order to recover from the Fail Site mode, you must click the Refresh Database Layout button. Sizing Guidance Recommendations As Jeff recently discussed in Ask The Perf Guy: How Big Is Too Big?, we are now providing explicit recommendations on the maximum number of processor cores and memory that should be deployed in each Exchange 2013 server. The calculator will now warn you if you attempt a design that exceeds these recommendations. As always, we welcome your feedback. Ross Smith IV Principal Program Manager Office 365 Customer Experience23KViews0likes6Comments