The RMS server, by definition, is the first management server installed in a management group. The RMS is differentiated from other management servers (MS) by two distinct services and a host of distinct workflows that run as a part of the health service on the RMS.
The “SDK Service” (OMSDK)
: When I hear the term “SDK” I typically think code libraries that I can use to write custom code against. With OpsMgr 2007 the SDK is really two things: 1) A software development kit -
2) A service running on the RMS, which is the single point of access for SDK connectivity. It is the latter part of that definition that is most relevant when thinking about deployment.
The “Config Service” (OMCFG)
: In MOM 2005 the centralized configuration store for the management group is the OpsDB. In turn each MOM 2005 management server was querying directly to the OpsDB to get its understanding of configuration. This was a fairly costly process that required constant resource overhead on the DB server, which was already busy enough processing operational data. The OpsDB is still the central store of configuration in OpsMgr 2007, but the RMS server has taken on the role of being the single point of access for that configuration data from the DB via the Config Service. All other systems in the management group get their configuration (directly or indirectly) from the RMS.
Workflows under the “Health Service”
: A number of distinct workflows, assigned to the RMS by rules in a number of the out of the box MP’s are run exclusively by the HealthService on the RMS. Examples of these workflows include “AD assignment rules”, “Notifications”, “Health Watcher Instances” and the “OpsDB partitioning and grooming processes”. In effect many things that in MOM 2005 used to be scripts, or SQL jobs or functionality written directly into product code is now running as a rule on the RMS.
The RMS Platform
Now that you have a basic idea of the role an RMS plays in a management group let’s talk a bit about how Microsoft IT deployed this role. Given all the distinct functions the RMS serves, and the scale of IT’s management groups, they opted for the same server platform as their Operational Database (OpsDB) servers:
o Server Model: HP ProLiant DL385 G1
o Processors: 2 x dual core (4 procs in the OS’ eyes) 2.2 Ghz AMD Opteron Processors:
o RAM: 8 GB
o Drives: 2 SAN drives; one for the cluster
and the other for storing the various OpsMgr 2007 service
directories that are shared between nodes of the RMS.
o Quorum drive: 2GB RAID 5 – nothing fancy here; less than 20mb is actually in use.
o State drive: 10GB RAID 0+1: Typically less than 3GB of actually data on this drive but I/O is high at scale.
o OS: Windows Server 2003 Enterprise x64 Edition with SP1
Using that platform Microsoft IT has seen the average RMS at 38.6% CPU utilization and memory paging of ~86 pages/sec. The state drive is quite busy sustaining an average of ~1200 transfers/sec and an average data rate of 14.89MB per second. In both cases ~95% of the drive activity is writes.
If you take resource utilization down to the level of the OpsMgr 2007 specific process that are running on the RMS the top consumer in IT’s deployments is the config service (Microsoft.MOM.ConfigServiceHost.exe), followed by the Health Service (HealthService.exe) and then the SDK service (Microsoft.MOM.Sdk.ServiceHost.exe) and Monitoring Host (MoniotringHost.exe) processes. The following table shows the average “% Processor Time” and “Private Bytes” for the relevant processes on the RMS:
% Processor Time
Given that the RMS is so vital to the functionality of a management group, IT planned from the earliest design phases to make the investment to ensure high availability (HA) of this role. With that in mind IT worked with the OpsMgr product group early on to test the setup and use of clustered RMS’. A clustered RMS is comprised of a resource group containing a network name, a dedicated IP, 3 shared services (HealthService; Config Service; SDK Service) and a shared drive for holding the central state files used by the shared services (referred to above as the state drive). With clustering of the RMS configured, automated failover can occur during and un-expected outage, as well as planned failovers during system upgrades or maintenance work. Microsoft IT’s experiences to date with RMS clustering have been very positive, but the key take away from both the deployment of the RMS and the OpsDB is that the monitoring team has built up its knowledge around configuring and working with clusters and clustered resources. The setup process is well documented in the