First published on TechNet on Apr 15, 2011
Hi folks,
Ned
here again. It’s been nearly a month since the
last
Mail Sack post so I’ve built up a good head of steam. Today we discuss FRS, FSMO, Authentication, Authorization, USMT, DFSR, VPN, Interactive Logon, LDAP, DFSN, MS Certified Masters, Kerberos, and other stuff. Plus a small contest for geek bragging rights.
Clickity Clackity Clack.
Question
I’ve read TechNet articles stating that the PDC Emulator is contacted when authentication fails - in case a newer password is available - and the PDCE would know this. What isn't stated explicitly is whether the client contacts or the current DC contacts the PDCE
on behalf
of the client. This is important to us as our clients won’t always have a routable connection to the PDCE but our DCs will; a DMZ/Perimeter network scenario basically.
Answer
Excellent question! We document the password and logon behaviors here rather loosely:
http://msdn.microsoft.com/en-us/library/cc223752(PROT.13).aspx
. Specifically for the “bad password, let’s try the PDCE” piece, it works like this:
-
I have two DCs and a client.
-
The PDCE is named 2008r2-srv-01 (10.70.0.101).
-
The other DC is named 2008r2-srv-02 (10.70.0.102).
-
The client is named 7-x86-sp1-01 (10.70.0.111).
-
I configured the PDCE firewall to block ALL traffic from the client IP address. The PDCE can only hear from the other DC, like in your proposed DMZ. The non-PDCE and client can talk without restriction.
1. I use some bad credentials on my Windows 7 client (using RunAs to start notepad.exe as my Tony Wang account)
2. Then we see this conversation:
a. Frame 34, the client contacts his
02
DC with a Kerberos Logon request as
Twang
in the
Contoso
domain.
b. Frame 40, DC
02
knows the password is bad, so
he
then forwards the same Kerberos Logon request to the PDCE
01
.
c. Frame 41, the PDCE
01
responds back to the
02
DC with KDC Error 24 (“bad password”).
d. Frame 45, the DC
02
responds back to the client with “bad password”.
3. User now gets:
I described the so-called “urgent replication” here:
http://blogs.technet.com/b/askds/archive/2010/08/18/fine-grained-password-policy-and-urgent-repl...
. That covers how account lockout and password changes processing will work (that’s DC to PDCE too, so no worries there for you).
Question
Can you help me understand cached domain logons in more detail? At the moment I have many Windows XP laptops for mobile users. These users logon to the laptops using cached domain logins. Afterwards they establish a VPN connection to the company network. We have some third party software that and group policies that don’t work in this scenario, but work perfectly if the user logs on to our corporate network instead of the VPN, using the exact same laptop.
Answer
We don’t do a great job in documenting how the cached interactive logon credentials work. There is some info here that might be helpful, but it’s fairly limited:
How Interactive Logon Works
http://technet.microsoft.com/en-us/library/cc780332(v=WS.10).aspx
But from hearing this scenario many times, I can tell you that you are seeing expected behavior. Since a user is logging on interactively with cached creds (stored here in an encrypted form: HKEY_LOCAL_MACHINE\Security\Cache) while offline to a DC in your scenario, then they get a network created and access resources, anything that only happens at the interactive logon phase is not going to work. For example, logon scripts delivered by AD or group policy. Or security policies that apply when the computer is started back up (and won’t apply for another 90-120 minutes while VPN connected – which may not actually happen if the user only starts VPN for short periods).
I made a hideous flowchart to explain this better. It works –
very
oversimplified – like this:
As you can see, with a VPN not yet running, it is impossible to access a number of resources at interactive logon. So if your application’s “resource authentication” only works at interactive logon, there is nothing you can do unless the app changes.
This is why we created VPN at Logon and DirectAccess – there would be no reason to make use of those technologies otherwise.
How to configure a VPN connection to your corporate network in Windows XP Professional
http://support.microsoft.com/kb/305550
Where Is “Logon Using Dial-Up Connections” in Windows Vista?
http://blogs.technet.com/b/grouppolicy/archive/2007/07/30/where-is-logon-using-dial-up-connecti...
DirectAccess
http://technet.microsoft.com/en-us/network/dd420463.aspx
If you have a VPN solution that doesn’t allow XP to create the “dial-up network” at interactive logon, that’s something your remote-access vendor has to fix. Nothing we can do for you I’m afraid.
Question
Can DFSR use security protocols other than Kerberos? I see that it has an SPN registered but I never see that SPN used in my network captures or ticket cache.
Answer
DFSR uses Kerberos auth exclusively. The DFSR client’s TGS request does not contain the DFSR SPN, only the HOST computer name. So the special looking DFSR SPN is - pointless. It’s one of those “almost implemented” features you occasionally see. :)
Let’s look at this in action.
Two DFSR (
06
and
07
) servers doing initial sync, talking to their DC (
01
). TGS requests/responses, using only the computer HOST name SPNs:
Then DFSR service opens RPC connections between each server and uses Kerberos to encrypt the RPC traffic with RPC_C_AUTHN_LEVEL_PKT_PRIVACY, using RPC_C_AUTHN_GSS_NEGOTIATE and requiring RPC_C_QOS_CAPABILITIES_MUTUAL_AUTH. Since NTLM doesn’t support mutual authentication, DFSR can only use Kerberos:
If you block Kerberos from working (TCP/UDP 88), DFSR falls over and the service won’t start:
Event 1202
"Failed to contact domain controller..."
with an extended error of
"160 - the parameter is incorrect"
Question
I am using the USMT scanstate
/P
option to get a size estimate of a migration. But I don’t understand the output. For example:
4096 434405376
0 426539816
512 427467776
1024 428611584
2048 430821376
4096 434405376
8192 446136320
16384 467238912
32768 512098304
65536 587988992
131072 812908544
262144 1266679808
524288 2189426688
1048576 4041211904
Answer
USMT is telling you the size estimate based on your possible NTFS cluster sizes. So 4096 means a 4096-byte cluster sizes will take 434405376 bytes (or 414MB) in an uncompressed store. Starting in USMT 4.0 though the
/P
option was extended and now allows you to specify an XML output file. It’s a little more readable and includes temporary space needs:
scanstate c:\store /o /c /ue:* /ui:northamerica\nedpyle /i:migdocs.xml /i:migapp.xml /p:usmtsize.xml
<?xml version="1.0" encoding="UTF-8"?>
<PreMigration>
<storeSize>
<size clusterSize="4096">72669229056</size>
</storeSize>
<temporarySpace>
<size>151299104</size>
</temporarySpace>
</PreMigration>
scanstate c:\store /o /c
/nocompress
/ue:* /ui:northamerica\nedpyle /i:migdocs.xml /i:migapp.xml /p:usmtsize.xml
<?xml version="1.0" encoding="UTF-8"?>
<PreMigration>
<storeSize>
<size clusterSize="4096">92731744256</size>
<size clusterSize="0">92511635806</size>
<size clusterSize="512">92538449408</size>
<size clusterSize="1024">92565861376</size>
<size clusterSize="2048">92620566528</size>
<size clusterSize="4096">92731744256</size>
<size clusterSize="8192">92958539776</size>
<size clusterSize="16384">93413900288</size>
<size clusterSize="32768">94341398528</size>
<size clusterSize="65536">96226705408</size>
<size clusterSize="131072">100214767616</size>
<size clusterSize="262144">108447399936</size>
<size clusterSize="524288">125118185472</size>
<size clusterSize="1048576">159657230336</size>
</storeSize>
<temporarySpace>
<size>158364704</size>
</temporarySpace>
</PreMigration>
Sheesh, 72GB compressed. I need to do some housecleaning on this computer…
Question
I was poking around with
DFSRDIAG.EXE DUMPMACHINECFG
and I noticed these polling settings. What are they?
Answer
Good eye. DFSR uses LDAP to poll Active Directory in two ways in order to detect changes to the topology:
1. Every five minutes (hard-coded wait time) light polling checks to see if subscriber objects have changed under the computer’s Dfsr-LocalSettings container. If not, it waits another five minutes and tries again. If there
is
something new, it does a full LDAP lookup of all the settings in the Dfsr-GlobalSettings and its Dfsr-LocalSettings container, slurps down everything, and acts upon it.
2. Every sixty minutes (configurable wait time) it slurps down everything just like a light poll that detected changes, no matter if a change was detected or not. Just to be sure.
Want to skip these timers and go for an update right now?
DFSRDIAG.EXE POLLAD
.
Question
While reviewing FRS
KB266679
I noted:
"The current VV join is inherently inefficient. During normal replication, upstream partners build a single staging file, which can source all downstream partners. In a VV join, all computers that have outbound connections to a new or reinitialized downstream partner build staging files designated solely for that partner. If 10 computers do an initial join from \\Server1, the join builds 10 files in stage for each file being replicated."
Is this true – even if the file is
identical
FRS makes that many copies? What about DFSR?
Answer
It is true. On the FRS hub server you need staging as large as the largest file x15 (if you have 15 or more spokes) or you end up becoming rather ‘single threaded’; a big file goes in, gets replicated to one server, then tossed. Then the same file goes in, gets replicated to one server, gets tossed, etc.
Here I create this 1Gb file with my staging folder set to 1.5 GB (hub and 2 spokes):
Note how filename and modified are changing here in staging as it goes through one a time, as that’s all that can fit. If I made the staging 3GB, I’d be able to get both downstream servers replicating at once, but there would definitely be
two
identical copies
of the same file:
Luckily, you are not using FRS to replicate large files anymore, right? Just SYSVOL, and you’re
planning to get rid of that
also, right? Riiiiiiiiggghhhht?
DFSR doesn’t do this – one file gets used for all the connections in order to save IO and staging disk space. As long as you don’t hit quota cleanup, a staged file will stay there until doomsday and be used infinitely. So when it works on say,
32 files at once
, they are all different files.
Question
Are there any DFSR registry tuning options in Windows Server 2003 R2? This
article
only mentions Win2008 R2.
Answer
No, there are none. All of the OS non-specific ones listed are still valuable though:
-
Consider multiple hubs
-
Increase staging quota
-
Latest QFE and SP
-
Turn off RDC on fast connections with mostly smaller files
-
Consider and test anti-virus exclusions
-
Pre-seed the data when setting up a new replicated folder
-
Use 64-bit OS with as much RAM as possible on hubs
-
Use the fastest disk subsystem you can afford on hubs
-
Use reliable networks <--
this one is especially important on 2003 R2 as it does not support asynchronous RPC
Question
Is there a scriptable way to change do what
DFSUTIL.EXE CLIENT PROPERTY STATE ACTIVE
or Windows Explorer’s DFS’
Set Active
tabs do? Perhaps with PowerShell?
Answer
In theory, they could implement what the
DfsShlEx.dll
is doing in Windows Explorer:
NetDfsSetClientInfo
Not a cmdlet (not even .NET), but could eventually be exposed by .NET’s DLLImport and thusly, PowerShell. Which sounds really, really gross to me.
Or just drive
DFSUTIL.EXE
in your code. I hesitate to ask why you’d want to script this. In fact, I don’t want to know. :)
Question
Are there problems with a user logging on to their new destination computer
before
USMT loadstate is run to migrate their profile?
Answer
Yes, if they then start Office 2007/2010 apps like Word, Outlook, Excel, etc. portions of their Office migration will not work. Office relies heavily on reusing its own built-in ‘upgrade’ code:
http://support.microsoft.com/kb/2023591
Note
To migrate application settings, you must install applications on the destination computer before you run the
loadstate
command. For Office installations, you must run the LoadState tool to apply settings before you start Office on the destination computer for the first time by using a migrated user. If you start Office for a user before you run the LoadState tool, many settings of Office will not migrate correctly.
Other applications may be similarly affected, Office is just the one we know about and harp on.
Question
I am seeing very often that a process named
DFSFRSHOST.EXE
is taking 10-15% CPU resources and at the same time the LAN is pretty busy. Some servers have it and some don’t. When the server is rebooted it doesn’t appear for several days.
Answer
Someone is running DFSR health reports on some servers and not others – that process is what gathers DFSR health data on a server. It could be that someone has configured scheduled reports to run with DFSRADMIN HEALTH, or is just running it using DFSMGMT.MSC and isn’t telling you. If you have an enormous number of files being replicated the report can definitely run for a long time and consume some resources; best to schedule it off hours if you’re in “millions of files” territory, especially on older hardware and slower disks.
Question
FRS replication is not working for SYSVOL in my domain after we started adding our new Win2008 R2 DCs. I see this endlessly in my NTFRS debug logs:
Cmd 0039ca50, CxtG c2d9eec5, WS ERROR_INVALID_DATA, To DC2.mydomain.contoso.com Len: (436) [SndFail - rpc call]
Is FRS compatible between Win2003 and Win2008 R2 DCs?
Answer
That type of error makes me think you have some intrusion protection software installed (perhaps on the new servers, in a different version than on the other servers) or something is otherwise altering data on the network (such as when going through a packet-inspecting firewall).
We only ever see that issues when caused by a third party. There are no problems with FRS talking to each other on 2003, 2008, or 2008 R2. The FRS RPC code has not changed
in many years
.
You should get double-sided network captures and see if something is altering the traffic between the two servers. Everything RPC should look identical in both captures, down to a payload level. You should also try *
removing
* any security software from the 2 DCs and retesting (not disabling; that does nothing for most security products – their drivers are still loaded when their services are stopped).
Question
When I run USMT 4.0 scanstate using
/nocompress
I see a
catalog.mig
created. It seems to vary in size a lot between various computers. What is that?
Answer
It contains all the non-file goo collected during the gather; mainly the migrated registry data.
Other Stuff
James P Carrion has been posting a very real look into the MS Certified Masters program as seen through the eyes of a student working towards his Directory Services cert. If you’ve thought about this certification I recommend you read on,
it’s fascinating stuff
. Start at the oldest post and work forward; you can actually see his descent into madness…
----------
Microsoft uses a web-based system for facilities requests. The folks that run that department are excellent and the web system usually works great. Every so often though, you get something interesting like this…
Uuuhhh, I guess I can wait to see how that pans out.
-----------
And finally here is this week’s
Stump the Geek
contest picture:
Name
both
movies in which this picture appears. The first correct reply in the Comments gets the title of “Silverback Alpha Geek”. And nothing else… it’s a cruel world.
Have a good weekend folks.
- Ned “hamadryas baboon” Pyle