Robert’s Rules of Exchange: Multi-Role Servers
Published Apr 08 2011 06:00 AM 77.6K Views

Overview

Robert's Rules of Exchange is a series of blog posts in which we take a fictitious company, describe their existing Exchange implementation, and then walk you through the design, installation, and configuration of their Exchange 2010 environment. See Robert's Rules of Exchange: Table of Blogs for all posts in this series.

A great big hello to all my faithful readers out there! I have to apologize for not posting in a while. Since the beginning of the holidays, I have been exceedingly busy, but I really want to get back to the Robert’s Rules posts, and I hope to be more involved in this going forward. Also, please keep the great comments and questions coming! I read and try to answer every single one of them. If you are going to TechEd North America 2011 in Atlanta, I’ll be there presenting, doing some stuff with the MCM Exchange team and generally making a nuisance of myself, so come look me up and introduce yourself!

In this blog post, I want to talk a little about multi-role servers. This is something that the Exchange team and Microsoft Services presents as the first notional solution (“notional solution” meaning the first idea of a solution, the first “rough draft” of what we propose deploying) to almost every customer we work with, and something that causes a lot of confusion since it is certainly a different set of guidance than our previous guidance. So, I want to talk about what we mean by “multi-role servers”, why we think they are a great solution, sizing multi-role deployments, when they might not fit your deployment, and what the real impact is in moving away from multi-role in your solution.

What do we mean “Multi-Role Servers”?

When we talk about multi-role servers in Exchange 2010, we are talking about a single server with all three of the core Exchange 2010 roles installed – Client Access, Hub Transport and Mailbox roles. While having any given two of these roles installed is technically a multi-role deployment, and even in Exchange Server 2007 we saw a lot of customers collocating the Client Access and Hub Transport roles, when we talk about multi-role, we are really talking about collocation of all three core roles on the same server.

In Exchange 2007, we did not support the Client Access or Hub Transport roles on clustered servers, so there was no way to have a high availability deployment (meaning Cluster Continuous Replication clusters) with multi-role servers. In Exchange 2010, we introduced Database Availability Groups (DAGs), which don't have that limitation, and subsequently we have changed the first line recommendation to utilize multi-role servers where possible. The rest of this post discusses why we believe that in almost every single scenario, this is the best solution for our customers.

Why “Multi-Role” for Exchange 2010?

One of the things I really tried to hammer home in the Storage Planning post was the idea of simplicity. Every time I sit down with a customer to discuss how they should deploy Exchange 2010, I start with the most simple solution that I can possibly think of. Remember that we have already argued that simplicity typically brings many “Good Things™” into a solution. These “Good Things™” include (but are not limited to) a lower capital expenditure (CapEx), a lower operational expenditure (OpEx), a higher chance of success of both deployment and meeting operational requirements such as high availability and site resilience. Complexity, on the other hand, introduces risk. Complexity when it is not needed is a “Bad Thing™”. Of course, when a complexity is brought on because of a requirement, it is not a “Bad Thing™”. It is only when we don’t really need to introduce that complexity that I have a problem with it.

Based on the last blog post (the Storage post), we know that Robert’s Rules is going with the simple storage infrastructure of direct attached storage with no RAID – what Microsoft calls JBOD. If we combine that with multi-role servers where every server we deploy in the environment is exactly the same and we significantly reduce the complexity of the system. Every server has the same amount of RAM, the same number of disks, the same network card, the same video card, the same drivers, the same firmware/BIOS – everything is the same. You have less servers in your test lab to track, you have less different drivers or firmware to test, you have an easier time deciding what version of what software or firmware to deploy to what servers. On top of that, every server has exactly the same settings, including the same OS settings and the same Exchange settings, as well as any other agents or virus scanners or whatever on those servers. Everything is exactly the same at the server level. This significantly reduces your OpEx because you have a single platform to both deploy and support – simplicity of management means that your people need to do less work to support those servers, and that means it costs you less money!

Separate Role Servers – An Example

Now, let’s think about the number of servers we need in an environment. I’m going to play around with the Exchange 2010 Mailbox Role Requirements Calculator a bit here, so I’ve downloaded the latest version (14.4 as of this writing). I will also start with a solution with separate roles across the board – Mailbox, Client Access and Hub Transport all separated. This is totally supported, and what many of my customers believe is the Microsoft recommended approach. After we size that and figure out how many servers we’re talking about, we will look at the multi-role equivalent.

Looking on a popular server vendor web site at their small and medium business server page, I found a server that happens to have an Intel X5677 processor, so I’ll use that as my base system – an 8-core server. Using the Exchange Processor Query Tool, I find that servers using this processor at 8-cores per server have an average SPECint 2006 Rate Value of 297, so I’ll use that in the calculator as my processor numbers. Note that by default, the servers in the Role Calculator are not marked as multi-role.

Opening the Role Requirements calculator fresh from download with no changes, I’ll put those values in as my server configuration – 8 cores and SPECint2006 rate value of 297. Making only that change, we can then look at the Role Requirements page. We have 6 servers in a DAG, 4 copies of the data, and we have 24,000 users, and the server processors will be 36% utilized, and the servers will require 48 GB of RAM. Not bad, all in all. EXCEPT… That is really quite underutilized as far as processor is concerned. Open the calculator yourself, and “hover over” the “Mailbox Role CPU Utilization” field under the “Server Configuration” section of the “Role Requirements” tab. There is a comment to help you understand the utilization numbers. That comment says that for Mailbox role machines in a DAG with only the mailbox role configured, that we should not go over 80% utilization. But we’re at 36% utilization. That’s a lot of wasted processor! We just spent the money on an 8-core system, and we aren’t using that. So, I’m going to pull 4 cores out of each server.

According to the Processor Query Tool, a 4-core system with that processor will have a SPECint2006 rate value of 153. Let’s see what that does by putting that into the Mailbox Role Calculator. That moves us to 69% processor utilization, which is much better. I would feel much better recommending that to one of my customers. This change didn’t affect our memory requirements at all.

The next thing we’ll look at is the number of cores of the other roles we need. At the top of the “Role Requirements” tab, we can see that this solution will require a minimum of 2 cores of Hub Transport, and 8 cores of Client Access. So, as good engineers we propose our customers have one 4-core server for Hub Transport and two 4-core servers for Client Access, right? Absolutely not! We designed this solution for 2 server failures (6 servers in the DAG with 4 copies can sustain 3 server failures, which is a bit excessive, but sustaining 2 server failures is quite common, so for our example we’ll stick with that). So, for CAS and HT both, we need 2 additional servers for server failure scenarios. If I lose 2 CAS servers, I still need to have 8 cores online on my remaining CAS servers to support a fully utilized environment – that means I need a minimum of 4 CAS servers with 4-cores each. If I lose 2 HT servers, I need 1 remaining server to handle my message traffic load (really one half of a server – 2 cores – but you can’t do that), so I need a minimum of 3 HT servers.

How many servers is this all together? We have 6 Mailbox servers, 4 Client Access servers, and 3 Hub Transport servers. That would be 13 servers, in total. Not too bad for 24,000 users, right? What are the memory models of these three servers? CAS guidance is 2 GB per core, and HT guidance is 1 GB per core. So we have 6 servers with 48 GB (Mailbox), 4 servers with 8 GB (CAS) and 4 servers with 4 GB (HT). Our relatively simple environment here has 3 different server types, 3 different memory configurations, 3 different OS configurations, 3 different Exchange configurations, and 3 different patching/maintenance plans. Simple? I think not.

Multi-Role Servers – An Example

Now, using the same server, the same processor, the same everything as above, we’ll simply change the calculator to multi-role. On the “Input” tab, I change the multi-role switch (“Server Multi-Role Configuration (MBX+CAS+HT)”) to “Yes”. Now, over to the “Role Requirements” tab, and … WHOA!!! Big red blotch on my spreadsheet! What does that mean? Can’t be good.

Once again, if we look at the comment that the sadistic individual who wrote the calculator has left for us, we can see that a multi-role server in a mailbox resiliency solution should not have 40% or higher utilization for the Mailbox role. This is because we have the Client Access and Hub Transport roles on the server as well, and they have processor requirements. What we basically do here is allocate half of our processor utilization to the CAS and HT roles in this situation. So, let’s go back to the 8 cores per server using SPECint2006 rate value of 297.

That change gets us back into the right utilization range. Looking at the “Role Requirements” tab again, we now have a “Mailbox Role CPU Utilization” of 36%. Since the maximum we want is 39%, that is a decent utilization of our hardware. The other change we see is that we were bumped from 48 GB of RAM per server to 64 GB of RAM, which is another cost impact to the price of the servers, but the bump from 48 GB to 64 GB is not nearly as expensive as it was a few years ago, and I see a lot of customers purchasing 64 GB if not more RAM in almost all of their servers (I actually see a lot of 128 GB machines out there).

Now, the great thing about this is the fact that we are down to 6 servers total. Let’s think about the things that go into the total cost of ownership of servers:

  • Cost of hardware: This will be based on the servers themselves, and the vendor we select, but the cost of one 4-core processor per server and 16 GB of RAM per server vs. the cost of 7 additional servers shouldn’t be hard to figure out.
  • Cost of floor space or rack space: Every time you add another server into the racks, you have to pay for that in some way. Quite a few of my customers are quite space constrained in their datacenters, and adding new servers is difficult or impossible. Server consolidation is a huge push for almost all of my customers and consolidating 20 or 30 servers down to 6 has “more win” for the project than consolidating the same number of servers down to 13. Also, by having both processor sockets in our servers populated, we have a better “core density” for the amount of rack space we take. Removing the processor from the socket on this one server doesn’t change the amount of physical space that the server takes!
  • Cost of HVAC: Obviously more processor and more RAM means more heat generated by each of the 6 Mailbox servers we have, but having 7 additional servers would generate even more heat than the additional processors and RAM would. Easy to see the cost savings here.
  • Cost of maintenance Once again, this is an easy win. 6 servers will be easier and cheaper to manage than 13 servers would, especially when you consider that those 6 servers are identical rather than having 3 separate server builds and configurations in our separate role servers.  Something to keep in mind here is the fact that in almost every scenario, changes in the OpEx over the life of a solution (typically 3-5 years) is significantly more impactful than changes in the CapEx.

I think that the bottom line here is that there are a lot of reasons to start with the multi-role server as your “first cut” architecture, not the least of which is the simplicity of the design compared to the complexity introduced by having each role separated.

When is Multi-Role Not Right for You?

There are very few cases where the multi-role servers are not the appropriate choice, and quite often in those cases, manipulating the number of servers or the number of DAGs slightly will change things so that the multi-role is possible.

Can You Use Multi-Role in a Virtualized Environment?

Let’s think about the case with virtualization. Our guidance around virtualization is that a good “rule of thumb” overhead factor is 10% overhead for the user load on a server if you are virtualizing. In other words, a given guest machine will be able to do 10% less work than expected if it was a physical machine with the exact same physical processor. Or, another way to look at this is that each user will cause about 10% more work in the guest than they would in a physical implementation. So, I can use a 1.1 “Megacycle Multiplication Factor” in my user tier definition on the “Input” tab, and that just puts me to 39% processor utilization for this hardware. Of course, we haven’t taken into account the fact that we haven’t allocated processors for the host, and the fact that we have to pay extra licensing fees to VMware if we want to run 8-core guest OSes.

If we go back to our 4-core example, set our “Megacycle Multiplication Factor” to 1.1, and say that we have 2 DAGs rather than 1, our processor utilization for these 4-core multi-role servers goes to 38%, making this a reasonable virtualized multi-role solution. Other customers might decide to split the roles out, possibly with say 6 Mailbox role servers virtualized, and CAS and HT collocated on 6 more virtualized servers.

Either of these solutions are certainly supported solutions, but we now would have twice as many servers to monitor and manage as we would with our physical multi-role servers. And as we saw above – more servers will typically mean more cost in your patching OpEx. What we’re really trying to do here is force a solution (virtualization) when the requirements don’t drive us that direction in many cases. Don’t get me wrong – I fully support and recommend a virtualized Exchange environment when the customer’s requirements drive us to leverage virtualization as a tool to give them the best Exchange solution for the money they spend, but when customers want to virtualize Exchange just because they want every workload virtualized, that is trying to shove a round peg into a square hole. Note that the next installment of Robert’s Rules will be around virtualized Exchange, when it is right, and when it is wrong.

Can You Use Multi-Role on Blade Servers?

I see this as exactly the same question as the virtualization question above. Certain sizing situations might take the proposed solution out of the realm of where blade servers can provide a hardware capability that is required. For instance, I have seen cases where the hardware requirements of the multi-role servers are quite a bit more than a blade server can provide (think 16, 24 or 32-core machines with 128 GB of RAM or similar). Generally, this means that you have a very high user load on those servers, and you could reduce the core count or memory requirements by reducing the number of users per server. As we showed above, you can do this by adding more servers or more DAGs.

Multi-Role and Software Load Balancing

We all know that the recommendation for Exchange 2010 is to use hardware load balancers, but the fact is that using Windows Network Load Balancing (WNLB - a software load balancer that comes with Windows Server 2008 and Windows Server 2008 R2 as well as older versions) is supported and some customers will use that. Hardware load balancers cost money. Sometimes there is no money for purchase of hardware load balancers for smaller implementations – although there are some very cost effective hardware load balancers from some Microsoft partners out there today that could get you into a highly available hardware load balanced solution for approximately US$3,000.00. So I would argue that there are very few customers that couldn’t afford that.

The single real limitation of the multi-role is the fact that WNLB is not supported on the same server running Windows Failover Clustering. Although Exchange 2010 administrators never have to deal with cluster.exe or any other cluster admin tools, the fact is that DAGs utilize Windows Failover Clustering services to provide a framework for the DAG, utilizing features such as the quorum model and a few other things. This means that if you have a DAG in a multi-role architecture, and you need load balancing (as all highly available solutions will need), you will be forced to purchase a hardware load balancer.

In some organizations where the network team has standardized on a given load-balancing appliance vendor and their branch offices need Exchange in high availability deployed at the branch office locations, it is possible that they will not be allowed to purchase another vendor’s small-load small-cost load-balancer hardware. In cases like this where WNLB is required, a multi-role implementation will not be possible.

Sizing Multi-Role Deployments

As we have discussed already, sizing of multi-role Exchange Server 2010 deployments is not much different than the sizing you do today. The Mailbox Role Calculator is already designed to support that configuration. Just choose “Yes” what is the second setting in the top left of the “Input” tab (at least that is where it is in version 14.4 of the calculator), and make sure that your processor utilization isn’t over 39%.

Note that the “39%” is really based on the fact that Microsoft recommendations are built around a maximum processor utilization of 80%. In reality, that is an arbitrary number that Microsoft chose because it is quite safe – if we make recommendations around that 80% number and for some reason you have 10% more utilization than you expected, you are still safe because in the worst case, you will then have 90% utilization on those servers. Some of our customers choose 90% as their threshold, and this is a perfectly acceptable number when you know what you are doing and what the risks are around your estimation of how your users will utilize Exchange servers and the processors on those server. The spreadsheet will have red flags, but those are just to make sure you see that there is something outside of our 80% total number.

Is there an Impact of not Separating the Roles?

There are a few things that are different in the technical details of how Exchange works and how you will manage Exchange when you consider having multi-role or separated role implementations. For instance, when the Mailbox role hosting a mailbox database and Hub Transport role coexist on a server within a DAG, the two roles are aware of this. In this scenario when a user sends a message, if possible, the Mailbox Submission service on the Mailbox role on this server will select a Hub Transport role on another server within the Active Directory site to submit the message for transport (it will fall back to the Hub Transport role collocated on that server if there is not another Hub Transport role in the AD site). This allows Exchange as a system to remove the single point of failure where a message goes from the mailbox database on a server to transport on the same server and then the server fails, thus not allowing shadow redundancy to provide transport resilience. The reverse is also true – if a message is delivered from outside the site to the Hub Transport that is collocated with the Mailbox role holding the destination mailbox, the message will be redirected through another Hub Transport role if possible to provide for transport redundancy. For more information on this, please see Hub Transport and Mailbox Server Roles Coexistence When Using DAGs in the documentation.

Another thing to note is that when you design your DAG implementation, you should always define where the file share will exist for your File Share Witness (FSW) cluster resource. When you create the DAG with the New-DatabaseAvailabilityGroup cmdlet, you can choose to not specify the WitnessDirectory and WitnessServer parameters. In this case, Exchange will choose a server and directory for your FSW, typically on a Hub Transport server. Well, in the case where every Hub Transport server in the Active Directory site is also a member of that DAG, this introduces a problem! Where do you store the File Share Witness? My solution to that is to have another machine (could be virtual, but if so, it must be on separate physical hardware than your DAG servers) that can host a file share. This could be a machine designated as a file share host, or a printer host, or similar. I wouldn’t recommend a Global Catalog or Domain Controller server because as an administrator, I don’t want file shares on my Domain Controllers for security reasons and I don’t want to grant my Exchange Trusted Subsystem security group domain administrator privileges!

There are also some management scenarios that you might need to account for. For instance, when you patch a multi-role server, you might have to change your patching plans compared to what you are doing with your Exchange 2003 or 2007 implementation today. For more information on this, please see my Patching the Multi-Role Server DAG post.

You will configure Exchange the same way whether you have the roles separated or you have a multi-role deployment. The majority of the difference comes down to what we have already discussed: sizing of the servers, simplicity in the design of your implementation, simplicity of managing the servers, and in most cases cost savings because of less servers and more simple management. Choosing not to deploy multi-role Exchange 2010 architectures introduces complexity to your system that is most likely not required, and that introduces costs and raises risk to your Exchange implementation (remember that every complexity raises risks, no matter how small the complexity or how small the risk).

Conclusion

The conclusion here is the same thing you will hear me saying over and over again. You should always start your Exchange Server 2010 design efforts around the most simple solution possible – multi-role servers with JBOD storage (no-RAID on direct attached storage). Only move away from the simplest solution if there is a real reason to do so. As I said, I always start design discussions with multi-role, and that is the recommended solution from the Exchange team.

If that is the recommendation, then why is Robert’s Rules not using multi-role?!?!

When I first designed this blog series, the idea was that I wanted to make sure I show how to do load balancing. At the time, we didn’t have easily available virtualized versions of hardware load balancers, at least not on the Hyper-V platform. Since starting the series Kemp has a Hyper-V version of their load balancer available, and I am going to show that in the blog. Ross Smith IV has been telling me over and over that there is little value in showing Windows Network Load Balancing since we strongly recommend a hardware load balancer in large enterprise deployments.

SO…

I’m going to redesign my Exchange 2010 implementation at Robert’s Rules to utilize the multi-role environment. Before writing the next post (on Virtualization of Exchange, as I said above), I will revise the scenario article with a new image and utilization of a 4-node DAG – 2 nodes in the HSV datacenter, 2 nodes in the LFH datacenter.

And once again, thanks to all of you for reading these posts! Keep up the great questions and comments!!

Robert Gillies

33 Comments
Not applicable

Great post again Robert !

I'm interested in an alternative that "small" costumers might have for the popular 2 multi-role servers in a DAG configuration, with regards to load balancing and having the cas-array "work".

WNLB is not possible of course, software/hardware external balances introduce funds issues... and Round Robin seems like the most logical way to work this out.. but then again it's not perfect ....

Is there a recommended supported method solve this dilemma ?  

Thanks !

ilantz

Not applicable

Ilantz, in the situation where you have the CAS roles on the DAG servers, you are correct, WNLB is not supported.  There has been a lot of internal discussion around DNS "round robin" as a possible solution, but the problem there is that when your TTL times out, the client will query DNS for the addresses again and reconnect.  When the client does that, they have a 50/50 chance of connecting to a different server.  For some clients, this is "masked" from the user (Outlook in RPC mode and EAS clients will both resend the credentials silently), but for some clients (OWA comes to mind), this will cause the user to be prompted for credentials.  It is the opinion of Microsoft that this is a poor user experience, so we don't recommend doing this.

The recommended solution in this case is to look to our hardware load balancer partners and find an agressively low priced solution.  For instance, KEMP (www.kemptechnologies.com) has a purpose built load balancer for Exchange 2010 that costs less than US$1600 list price AND supports a highly available architecture by having two devices.  That's HA load balancing in hardware for US$3200 list - not a bad solution.  KEMP also has virtualized versions of their load balancing devices in both Hyper-V and VMware versions...  

(NOTE:  I work for Microsoft, not KEMP, and I am not advocating KEMP over any other hardware load balancer vendor - if there is another vendor that wants equal billing, I am more than willing to work with you as well!!!)

--

rgillies at Microsoft

Not applicable

I have Exchange 2010 running all 4 roles (Mailbox, Client Access, Hub Transport, and Unified Comminications) for 150 users and have never had any issues. This has a DAG setup with a second server with all 4 roles for failover only (rather than balancing, though it's a future goal). This setup has not caused me any issues in the time I've used it. It's virtualized on VMware with a quad core CPU, 24GB of RAM, 4 NIC ports, and plenty of storage on an array. I realize it's not necessarily optimal, and everyone's needs are different, but for the small-to-medium size businesses it's a solution that can work well with appropriate maintenance and planning.

Not applicable

I just don't get the JBOD part...  Every time I see this as part of Exchange 2010 literature I cringe a little bit.  I totally get driving down costs and cheap SATA drives (use them myself in many scenarios) - but they are SO cheap and SUCH a small cost compared to the rest of the server, why not spring for a second drive?  At the VERY least the mirroring built into the OS will save you a lot of time in rebuilding the server in case of drive failure - and an inexpensive $50 raid card makes it even easier (or just use built-in raid 1 on the mobo depending on the box you're using).  JBOD just seems a bit irresponsible given the low cost to implement raid1 and relatively big payoff in case of drive failure...

Otherwise, great article, thanks!

Not applicable

I run Exchange 2010 virtualized on an HP ProLiant DL380 G6 with a quad-core Xeon, 24GB of RAM, 4 NIC ports, and plenty of storage for 150 users. I have all 4 major roles-- MBX, CA, HT, and UM-- plus Forefront Protection for Exchange (limited to one process for each realtime scanner), and I've never had any major issues.

I have about 30 ActiveSync mobile connections, 10 OWA users, and 20 RPC over HTTPS connections and the rest make up desktops using Outlook 2007 or 2010. My CPU usage generally sits about 60% during the day, and drops to 5-10% during the evening hours. My RAM load is about 19GB, and paging is about 200MB (so very minimal). Network traffic is load-balanced on 4 1Gbps ports, so again during the day it can spike a bit but I rarely see it using more than 1-2 Gbps.

I'm not saying it's ideal for everyone, but from what I'm reading in your post, it generally seems reasonable that as long as you monitor CPU/RAM/Disk/Network loads, and as long as you're taking suitable recovery needs for your environment, then running a multi-role server is a very efficient, cost-effective way of consolidating your servers.

Not applicable

@GoodThings2Life - Thanks for the comments!  Just wanted to say two things:

First, everyone needs to be aware that our guidance states that having UM servers on a virtual platform is not supported.  technet.microsoft.com/.../aa996719.aspx, in the section titled Server Virtualization, this is specifically called out.  This is primarily because of scalability concerns for customers larger than yours.  Because of your relatively small organization, this might not cause you issues, but as your org grows, you should be aware of this.

Second, just wanted to let everyone know that this is not just an architecture for the small or medium sized organizations.  Microsoft is recommending this multi-role server architecture for customers of all size.  I am workign with a customer that has about 1.5 million seats, and we are strongly recommending to that customer all three of the core roles collocated.  On these larger organizations, we see significant cost savings in both capital expenditures and operational expenditures with this collocation of server roles!

Not applicable

@pesos - Great comment, and a great point to reiterate.  Microsoft is going with the JBOD storage solution as our #1 recommendation, but that is only for the Exchange data storage - your databases, the associated log files and the system files used by the ESE database (the checkpoint file, etc.).  The operating system files, the Exchange binaries, etc. are recommended to be stored on storage that is protected with RAID.  As you called out, a simple mirror of the OS disk is inexpensive, and ensures that if a disk fails that has the OS on it, you are not required to rebuild your server from scratch.  For implementations where your data volumes are presented to Exchange through "mount points", the directories where you mount those volumes would also be on this system disk, and therefore protected by RAID.

Once again, only the Exchange data goes on disks that are JBOD, and only in situations where you have a minimum of 3 HA copies of the data in a DAG.  Since you have 3 HA copies (HA meaning "non-lagged") of the data online at all times in this situation, the loss of a single disk causes an unplanned switchover of a single database (approximately a 30 second outage to the users on that database), and then the administrative team would replace the failed disk, format, mount, and finally reseed to that disk to bring it back into use by Exchange.

One last thing - Microsoft is also recommending SAS as the controller for the "big cheap disks" because we see a better time between failure, and slightly higher performance from the SAS disks as opposed to the SATA disks.  The 7200 RPM SATA disks are certainly supported, but we prefer the SAS and that is what we recommend to customers when we are designing their systems.

Great comment, @pesos and everyone else!  Keep them coming!!

Not applicable

Great article and it just shows in my recent Exchange 2003 to 2010 upgrade that I got it right. I just couldn't think of a good reason to split out the roles, so rather than doing it just because I could I left them together.

I think a workable solution for a 2-server DAG multi-role implementation would be to use a CNAME with a short TTL (say 5 minutes at a maximum) to access the CAS. Have your monitoring software keep an eye of your critical CAS services and if one misbehaves then trigger a script to modify the CNAME to point to the other server.

Then at most you have a 5 minute client outage. If your DNS infrastructure can handle it then you could just tighten up the TTL until your SLA requirements are met (that is that you SLA requirements DON'T require you to go out and purcahse a load balancer, which most would).

Not applicable

Great post, well done!

I think its a good idea.

I have read about a 12 processor core "limit" for mailbox servers. What do you think about a 24-core multi-role server (with 12 core for mailbox an 6+6 for Hub and CAS role.)? Could it be an nice solution maybe for >10000 mailboxes? Or do we have other limits, for example the maximum number of network connection?

Not applicable

@Jeremy Hagan - That should work, but as you note it would require you doing the work to make sure that the script worked, etc.  If you needed a better than that 5 minute RTO, or if you needed to make sure that the load was actually balanced across the two CAS servers (smaller orgs won't need this), then you can't make this work, and this is the reason that it isn't one of our recommended solutions to the problem for our enterprise customers.  But, if it works for you, then go for it.

@XMichaeL - We have changed our guidance from "core count" to "socket count".  Read the table carefully on this page:  technet.microsoft.com/.../dd346699.aspx   What we say is that we recommend a maximum of 2 populated processor sockets for any single role server or a CAS/HT combined role server, and a maximum of 4 populated processor sockets for a mulit-role server (with CAS/HT/MBX all on the same server).  That gives a "theoretical" limit of 12 or 24 cores in the table if you have 6-core processors, but that doesn't mean that the 8-core processors are not supported or recommended.  Drive your maximums based on the number of sockets, not cores!  

Not applicable

This is from the Microsoft Press Self_paced Training Kit for Exam 70-662: "The witness server should be a Hub Transport server that does not have the Mailbox role installed."  Is this still Microsoft's recommendation?  If so it would seem to indicate multi-role servers have limited application.

Not applicable

@tpgbrennan: Any server in the same site will do, but they use an Exchange box as an example because it's a) part of the e-mail infrastructure and b) has the proper local group memberships. The reason for not picking a mailbox server I think is that it's expected to be a DAG member. Of course, if you have DAG configuration and a separate mailbox server (or another DAG) I see no reason for not using it to host the Witness share, just as long as it's not a mailbox server part of the DAG configuration it's witnessing.

Not applicable

@tpgbrennan, that appears to be a documentation bug in that training kit.  It's not that the witness should be a Hub Transport server without the Mailbox role, but rather that the system can automatically choose a server with that configuration as your witness server, if you like.  Otherwise, any Windows Server 2003 server in the AD forest that is not a member of the DAG can be the witness server for the DAG.

Not applicable

Great information Robert!  

We are a bit smaller shop with only 1000 users spread across Asia region. I just finished upgrading 2003 to 2010, I have 2 CAS/Hub role servers installed on VM (4-core each running WNLB) and have 2 physical MB servers (16 core each with 1 DAG), and a TMG deployed in the DMZ. Performance seems to be OK. What is really weird is that I am starting to notice few users from remote offices has over 60 to 100 TCP connection each to my CAS Array.  Has anyone seen this before??  I am just wondering what would cause so many TCP connection the CAS.

Thanks,

Ken

Not applicable

What do you consider the cutoff for medium sized organizations?

Not applicable

I'd like to thank Tony Redmond for the great post in reference to this Robert's Rules post.  I suggest reading it (and following Tony's blog, for that matter).  Check it out here:  thoughtsofanidlemind.wordpress.com/.../microsoft-reveals-the-truth-about-single-role-servers

One question Tony asks in his blog is "what's happened in the five years since to make Microsoft recant" the idea of having the roles separated.  The big thing here is that with the advent of CCR in Exchange 2007 as the high availability (HA) model of choice, the Exchange team didn't support having the roles other than Mailbox on the CCR machines (or even on an SCC cluster machine, for that matter).  Our customers pushed back on this because in many cases it did cause a higher number of servers in an organization.  As we moved from CCR in Exchange 2007 to the DAG in Exchange 2010, Microsoft decided to support the other roles when on the DAG nodes.

In my article above, we talk a lot about simplicity and how the multi-role server can simplify your Exchange implementation no matter the size of your organization.  What we really don't touch on is the fact that the multi-role server can also make better use of the newer hardware.  Processors now have 6 or 8 cores per processor, and making use of those high core counts with a single role server is sometimes difficult.  The multi-role server allows a higher processor density in a given server platform - having a server that supports 2 processor sockets with a single processor installed takes space in your datacenter and generates almost as much heat as it would if it had both processors, so why not make full use of the hardware in your system.

The only thing I would like to refute in Tony's blog article is the claim that somehow this had something to do with marketing.  Marketing has nothing to do with the technical recommendations we make around high availability.  The high availability technologies and how to deploy them have been a steady progression of revolution and evolution from Exchange 2003 with only what was later known as single copy clusters to Exchange 2007 with CCR to Exchange 2010 with the DAG.  The requirements and the recommendations around these HA technologies have changed with the versions.  But, as I teach in every course I ever teach around HA of Exchange - we don't deploy them because they are cool or snazzy or because Marketing wants it done.  You only deploy CCR or DAGs because you need the HA or the site resilience that DAGs bring.  If your customer requirements are such that you do not have a firm HA or site resilience requirement, then why deploy a DAG?  I will have very frank discussions with customers that want to deploy HA because their CIO has been reading CIO magazine again and thinks they need it when the business model is such that they have no hard requirement for it.

Thanks again, Tony!  Great blog!!

Not applicable

@Matt Anderson - the definition of small, medium and large organizatoins is quite nebulous, isn't it?  I don't personally have a definition.  For the customers I deal with (Dept of Defense and other large government entities), anything less than 10,000 is a fairly small engagement.  Anything up to about 100,000 mailboxes is medium sized and the large ones are over 100,000 mailboxes.  But then, scale has to do with what you think, not some arbitrary number someone else gives you.

Bottom line in sizing Exchange is to use your hardware wisely.  Don't allocate more hardware than you need, but don't undersize your hardware.  Have twice the processor that you need is an expense that the current economy doesn't support.  Same with RAM.  And as for disks, you just don't need the performance or redundancy of a SAN any more because Exchange 2010 doesn't need the performance like older versions did, and the DAG provides the redundancy you need.  Why go for a Bugatti Veyron of disk subsystems?  Sure it can go 250mph, but why would you do that when the speed limit is 70mph max in most of the US?  What you need is a Ford Expedition Extended Length so that you can easily go the 70mph you need and take everything you own in the back!  

Not applicable

We have been using WNLB due to the cost of hardware load balancers, but perhaps it's time to revisit that.

Currently we have four hyper-v guests: two mbx servers (dag) and two servers running both CAS/HT (wnlb).  We then have a 3rd server offsite running all roles just for DR purposes.  We only have around 300 mailboxes (and 100 or so resource mailboxes)

Does anyone here have experience with any low cost hardware load balancers (or virtual appliances even - we run hyper-v) they can recommend for Ex2010?

Not applicable

@pesos - I don't have personal experience with them (my customers tend to be larger and have the mainstream "enterprise" leaders like Cisco and F5), but KEMP is certainly the leader when it comes to the smaller environments.  I believe that Henrik Walther has deployed them - check his blog at blogs.msexchange.org/walther

Please note once again, folks.  I work for Microsoft, not KEMP.  I am not advocating KEMP as a solution better than others, just saying that they have an attractive price point and I have heard good things about their solution.

--

rgillies at Microsoft dot com

Not applicable

Do you have a strategy or recommended order to patch an Exchange 2010 environment were all 3 server roles are separated out on different hosts in a load balanced environment with a CAS array and 4 server DAG

Not applicable

Hello sir,

Thx for your valuable post.

We are gonna config 3 Node DAG infra. In this case we are thinking we don't need Witness server. Am i correct ?

And Still we have a market for WNLB who cant effort HLB or VHLB. We are configuring Dual NIC WNLB configuration. Is this situation, We use Same Ip range for Management Network and Production both. Is it correct ?

And during the Node fail over we saw below warning message at Event Viewer.

"NLB cluster [192.168.1.200]: NLB detected duplicate cluster subnets. This may be due to network partitioning, which prevents NLB heartbeats of one or more

hosts from reaching the other cluster hosts. Although NLB operations have resumed properly, please investigate the cause of the network partitioning."

Is this warning normal ?

Regards

Sunil

Not applicable

@Brenton Foggo - I would suggest a normal patching model.  We suggest "outside to inside", or with the three core roles "alphabetical order".  In other words, CAS, then HT, then MBX.  The biggest thing is that you should NEVER patch Mailbox before CAS and/or HT.

@SUnil-Nair - I am not a WNLB expert.  We do support WNLB for situations where hardware NLB cannot be purchased, but it is not a recommended solution, especially when there are attractive low-price solutions out there.  I don't have the knowledge to answer your questions, sorry.

--

rgillies at microsoft dot com

Not applicable

I understand Microsoft is moving away from recommending WNLB, as it has some limitations.

What I don't understand is why the OS team is not working on improving WNLB. It has basically stayed the same since Windows 2000.

If other manufacturers can create virtual appliances that work as well as a hardware load balancer, shouldn't Microsoft be able to come up with improvements for WNLB to overcome the current shortcomings?

Not applicable

@Twan - Wow.  That is certainly a loaded question.  I know that the Exchange team asked why we couldn't have Windows Failover Clustering and WNLB on the same servers so we could utilize WNLB on the multi-role servers, and the response was blank looks and "why would anyone want to do that?"  Not sure if anything will ever happen there, but I don't expect it.

I'm not really sure why the Windows team chooses not to update WNLB.  I haven't had any discussions with them about this in about 5 or 7 years, and was told at that time that we didn't think that the investment necessary to compete with the hardware load-balancer vendors weighed against the amount of revenue in Windwos this would drive didn't make a good business case.  The big deal is that the biggest hardware load-balancer vendors have special chips in their devices where they accelerate the performance of what they are doing, and in Windows, we don't have that capability.  I don't think we could really ever compete with the "big guys" like F5 or Cisco.  Because of this, I would think we aren't going to see much investment in WNLB.

Of course, I want to make sure everyone knows that this is just MY OPINION - not an official Microsoft statement!!  ;*)

--

rgillies at microsoft dot com

Not applicable

Please correct me if I am wrong - I think, one of the possible options to load-balance multi-role Exchange servers in a DAG using WNLB, is the TMG (that’s just my theoretical suggestion and not a proven in the field scenario at all). The easiest way to describe my thought, I think, is this sample scenario:

Customer already has pair of TMG servers that are part of the corporate AD forest. Each TMG has one NIC in DMZ and another NIC in corpnet. NICs in DMZ are in WNLB and the customer is publishing Exchange to the Internet thru them.

So far that’s an obvious scenario. Now the suggestion: publish Exchange to the corpnet using these TMG servers. Additional configuration for that:

1) Join TMG servers existing NICs in corpnet in WNLB (if not already done so) or install additional NICs for that purpose and join them in WNLB;

2) Change existing Exchange publishing rules to allow traffic from corpnet or create additional rules to publish Exchange to corpnet. The key thing here is the ability of TMG to publish Exchange RPC protocol and we need a new rule for publishing RPC to corpnet. I’ve never tried such RPC publishing, so it’s a theoretical suggestion, as the whole scenario by the way ;)

3) In the internal DNS, repoint CAS Array DNS entry to the TMG WNLB IP address in the corpnet. Internal URLs for different Exchange services also must lead to this IP.

So, in the end of the day, TMG servers are in WNLB, internal traffic somehow spreads amongst them thru WNLB functionality, and then TMG servers use server farm concept to proxy this traffic to CASs inside corpnet and to test, does each CAS is alive, to continue to proxy the traffic to it if it is alive. Of cause, additional load on these TMG servers must be taken into account before such a reconfiguration.

Not applicable

Eagerly waiting for the next installment of Robert’s Rules for virtualized Exchange!

Not applicable

Twan> What I don't understand is why the OS team is not working on improving WNLB. It has basically stayed the same since Windows 2000.

+++1

I believe that now is the time for the vNext NLB

Or ... Microsoft can outrun "big guys"(F5 or Cisco) companies at own field - the application level. I mean why not make a similar broker on CAS as RD Server session broker? Inexpensively and efficiently

Not applicable

@Shyamsundar Manian - Thanks!!  I very much appreciate you reading the posts!

@Sazonov ILYA [ sie ] - Thanks for the comment.  As a Microsoft guy, I'd like to quit pointing our customers elsewhere as well, but I guess Microsoft can't do everything.  I guess we'll see if they want to take that to the next level.  

Another idea that I've heard expressed is that we should just make our email clients and servers such that load balancing is not needed, or that the load balancing and client failover is "built in" and transparent to the user.  Personally, I like that idea even more - remove the need for load balancing and have it all built into our products out of the box...

Once again to both of you and everyone else that has commented - thanks for reading and thanks for the great comments!!

--

RGillies at Microsoft dot com

Not applicable

Hello RObert,

I have a one question. I am configuring 3 Node Dag. As i know we don't need Witness server with odd number of DAG nodes. So, is there any later configuration for this or just need to add all 3 DAG nodes to DAG ?

Regards

Not applicable

@Nair - remember that as you define your DAG it will go through being a one-node DAG and a two-node DAG before it is a three-node DAG.  ;*)  If you want to allow Exchange to just place the FSW (assuming you have a separate hub transport server, that is), you can do that, but in my opinion it is best practice to just define the FSW directory and server when you define the DAG.  Then, as the DAG goes from "majority node" quorum model to "majority node with FSW" and back, it will just make use of the directory and share that you have defined.

--

Rgillies at Microsoft dot com

Not applicable

Great post Robert

is there a limitation using multi-Role in exchange 2010 Sp1 with hosting switch espically in inter-tentant routing?

Thanks.

Not applicable

@NemoMania - there should be no adverse effects when deploying with the /hosting switch.  Same guidance.

Not applicable

@NemoMania - My buddy Scott Schnoll asked me to clarify a few things here...  We need to remember that Edge and UM cannot be installed as part of an environment when using the /hosting switch, so that is an impact.  

To extend what Scott said to me, as one of the main proponents of this "multi-role" approach, I generally don't think about UM being part of the "multi-role server".  I almost always propose the UM server(s) as being separate servers from the multi-role servers.  The sizing calculator does NOT take the UM role into account when you mark it as multi-role, so you need to be aware of that.

So, last note on the /hosting switch is to reiterate - /hosting doesn't support UM or Edge, so keep that in mind!

Great question, by the way.  Not many people know much about the /hosting option, so thanks for asking for clarification!

Version history
Last update:
‎Jul 01 2019 03:58 PM
Updated by: