Hey folks – Hilde here with a post about some DHCP migration snafus I've bumped into recently.
Many of my large, enterprise customers have DHCP servers that were deployed over a decade ago, humming along on some flavor of the Windows Server 2003 OS (WS 2003).
For the most part, those DHCP services have been pretty much dial-tone over the years but with the end of support coming for WS 2003 (that date is July 14, 2015, by the way), they are migrating the service to a newer OS version.
You know what they say about sleeping dogs? Well, there's nothing like walking by the sleeping DHCP dog and stomping on his tail.
TOPIC #1 – existing DHCP-registered DNS records get deleted when you disable/delete the scope on the old DHCP server.
You migrate scopes from OLD-DHCP-01 to NEW-DHCP-01 but you're leaving OLD-DHCP-01 on the wire as it isn't quite ready for decom yet.
As part of your migration, once you're done migrating the scope(s), you disable or delete the scope(s) from OLD-DHCP-01
Shortly after, your helpdesk lights up like a Christmas tree and there is a flood of issues which all seem to track back to name resolution problems.
A quick check of your DNS zone shows a large number of dynamic entries have been deleted from DNS.
A reboot or some other kick of the end-point device re-registers the records and for that specific device, service returns to normal.
A handful of devices isn't an epic fail but there were 1000s of devices affected; the outage was massive and impacted multiple business units.
Some/many of the end-point devices were non-Windows devices (printers, handhelds, etc) and are hard, if not impossible, to remotely reboot/manage at scale and/or in an automated fashion.
This was totally unexpected – NEW-DHCP-01 is operational, IP Helper statements are all fixed up, the reservations/leases are all over on that new server. Why did this happen?!?
Root Cause –
There is a setting in the DHCP server/scope options that says "Discard A and PTR records when lease is deleted"
This was checked (and is the default setting) on OLD-DHCP-01 and that server was still active
When the scope/lease/reservation was disabled/deleted from OLD-DHCP-01, the next time the server's DHCP cleanup process/thread ran, it went out and deleted the DNS records, as that is what it had in its DHCP DB.
The value of this checkbox is recorded for each lease/reservation in the DHCP DB at the time of the lease/reservation creation. In my experiences, I've seen where toggling this setting has no impact on existing leases/reservations.
Painful. Consider yourself warned.
A possible work-around –
Decommission OLD-DHCP-01 as part of the migration effort:
Shut the server down
Unplug the server from power
Remove the power supply(ies) from the server
Pull the network cable
Put tape over the NIC port(s)
Fill the NIC port(s) with hotglue
Remove the NIC
Cut the server in half with a sawzall
All of the above
TOPIC #2 – DHCP authorization details in AD are showing up inconsistently and/or in different locations. You're wondering what's going on and how/why they got there.
As part of your migration/clean-up efforts, you have a look at the DHCP Server Authorization records either via the DHCP UI, AD Sites and Services and/or ADSIEdit. Depending on the OS versions used, you may see inconsistent entry formats and/or relics of old DHCP servers long gone/renamed/re-IP'd).
For WS 2003 RTM DHCP Servers, the authorization record would register as an IP address, not a FQDN (as shown below in the AD Sites and Services MMC and ADSIEdit)
In WS 2003 R2 or SP2 (not sure which/when) this was changed and the authorization record would register as a FQDN (as shown below in the UIs)
If you look at the NetServices container via ADSIEdit or AD Sites and Services MMC, you might notice some differences here, too:
In the Windows 2000 DHCP Server service, the DHCP authorization process would register an entry directly under NetServices, as well as add an entry in the dhcpServers attribute of the DhcpRoot object under NetServices:
Windows 2000-era ADSIEdit (remember this UI?)
Windows 2003-era ADSIEdit
In the Windows 2003 DHCP Server service, the DHCP authorization process would write a record directly under NetServices but not add an entry in the dhcpServers attribute of the DhcpRoot object under NetServices.
In the screen shots below, notice there is an entry for all three DHCP Servers under NetServices but there is only an entry for the WS 2000 DHCP server in the dhcpServers attribute of the DhcpRoot object – neither the Windows Server 2003 nor Windows 2003 R2 DHCP servers registered a record there.
There was also a difference between the Windows 2000 and WS 2003 DHCP MMCs in how they'd display the DHCP "authorized servers" list
Windows 2000 Server – Manage Authorized Servers UI
It would only look at the dhcpServers attribute on the DhcpRoot object and only list the record(s) it found there (the WS 2000 DHCP server in this case)
WS 2003 – Manage Authorize Servers UI (same AD forest as above)
It looks at both the dhcpServers attribute (and lists the records there) as well as the NetServices container (and lists the WS 2003 and WS 2003 R2 DHCP server records there).
So, what's the bottom line here?
As always, test your changes when you can in a lab that is representative of your production environment
This is where I plug (again) the idea of testing your DR plans to establish an isolated lab from production backups. This way, you're validating your DR plans and producing a 'like-production' lab for testing. Schema extensions? Got 'em. Specific service accounts, OU structures, GPOs, etc? Got 'em. Throw in restores of your DHCP servers and you've got a lab to test your DHCP migration.
If you'd like some help getting this done or you don't know where to start, hit up your Technical Account Manager (TAM) for a PFE-led onsite delivery called AD Recovery Execution Service (ADRES)
Many large enterprises have been around for a long time and have infrastructure that began life as Windows 2000, NT 4.0, or even earlier.
In this case, the Windows 2000 Operating System is loooong out of support from Microsoft. I only used that OS so I would have a "historically accurate" representation of a typical infrastructure that has been around for a long time and to illustrate the differences across OS versions of DHCP. If I'd built my lab DHCP servers from the newer OSes, I wouldn't have been able to reproduce these idiosyncrasies.
Unexpected things can happen when you stomp on the tail of a dog that's been sleeping for 10 years.