Windows Server Summit 2024
Mar 26 2024 08:00 AM - Mar 28 2024 04:30 PM (PDT)
Microsoft Tech Community
LIVE
SOLVED

Windows 2016 Server stops DNS DHCP to Windows 10 PC's randomly

Copper Contributor

Hope someone here can help - I've been struggling with this one for some time.

 

I have a client that brought me in to install new Windows 10 machines for them. When we did this - we discovered we had a ton of issues - mostly revolving around map drives disconnecting.

 

Troubleshooting further today I've discovered that the server will stop responding to DNS AND DHCP queries to a random machine. I'm pretty certain it's not an issue with the computer itself, as all 3 computers will at different times have the same issue.

So at this very moment...

 

PC1 will not get DNS responses from the 2016 AD server. (ping google.com = could not find host google.com).

PC1 - ipconfig /renew = unable to contact your DHCP server.

PC1 - I *CAN* remote in using AnyDesk, therefore it does have internet connectivity. Currently remoted into this machine. Obviously I can't browse the web though.

 

PC2 ping google.com = works

PC2 ipconfig /renew = works.

 

Firewall is OFF for both PC's.

Currently PC1 is the problem, but it could be PC2 tomorrow.

 

IPConfig for both machines has

IP (192.168.1.14 PC1, 192.168.1.15 PC2)

DNS: 192.168.1.254 (AD 2016 Server) (no other DNS servers)

DHCP Server: 192.168.1.254

Gateway: 192.168.1.1

 

Server is Static IP

192.168.1.254

DNS: 192.168.1.254 (single entry)
Firewall Off on Server

 

PC1 WAS working a few hours ago, as was PC2. It seems like the server just decides to stop responding to requests from a machine at random. Rebooting PC1 Resolves the issue (But again, this can happen on PC2).

Another note - as Windows 10 seems to be involved in the breakdown - we have Win7 running in VirtualBox on each of the computers. Running in VBox Win7 Guest (win10 host) - and things work pretty reliably.

Any ideas or help would be very appreciated.

 

11 Replies

@PcComputerGuy 

 

Sounds like this could be a whole bunch of things. Thing's I'd look at to try and narrow down where along the chain the issue is

 

1. When PC1 is unable to ping google.com, can it ping the DC by IP and by name as well at the same time?

2. When PC1 is unable to ping google.com, can it ping an external IP (i.e. 8.8.8.8) during that time? (I suspect yes based on what you've already written, but good to check)

3. During the DNS/DHCP outage, can you do a NSLOOKUP to google.com if you use public DNS (i.e. 8.8.8.8) and does it resolve?

4. Is there anything in the event logs of the either the server or the client PCs? I'd be keen to see if the DNS and DHCP services suddenly stop.

5. Is it possible to use another DNS server 

6. What DNS does the DC forward onto, or is it using root hints? 

7. When the PC cannot resolve, can the server ping domain/IP of the client in question? 

 

Certainly a strange one. Can you elaborate on the issues you had with drive mappings dropping and what you did to fix them? 

I'd agree it could be many things. First clue is the firewalls being turned off. This shouldn't be necessary assuming NLA detects the network correctly and all get the Domain firewall profile. So if that's not happening then other things are likely broken as well. If you wanted more assistance then please run;

  • Dcdiag /v /c /d /e /s:%computername% >c:\dcdiag.log
  • repadmin /showrepl >C:\repl.txt
  • ipconfig /all > C:\dc1.txt
  • ipconfig /all > C:\dc2.txt
  • (etc. as other DC's exist)
  • ipconfig /all > C:\problemworkstation.txt

    then put unzipped text files up on OneDrive and share a link.

 

 

@HidMovThanks for taking the time to try and help.

 

1. When PC1 is unable to ping google.com, can it ping the DC by IP and by name as well at the same time?

A: It can ping the DC by IP only. Not by name.

 

2. When PC1 is unable to ping google.com, can it ping an external IP (i.e. 8.8.8.8) during that time? (I suspect yes based on what you've already written, but good to check)

A: I will double check, but I believe I did ping 8.8.8.8 successfully.

 

3. During the DNS/DHCP outage, can you do a NSLOOKUP to google.com if you use public DNS (i.e. 8.8.8.8) and does it resolve?

A: I am pretty sure yes - but will get the exact answer next time it happens.

 

4. Is there anything in the event logs of the either the server or the client PCs? I'd be keen to see if the DNS and DHCP services suddenly stop.

 

Here is Windows System Event Viewer on PC1 showing when approximately things fell apart (nobody was in the office.) Before 5:37pm things were working, then DNS stopped which all the other errors have to do with DNS resolution.

PcComputerGuy_0-1596642721831.png

Logs under Apps and Services > MS > Windows > DNS Client Events is empty (shows disabled). I have enabled it now.

 

Event Viewer on DC (DNS-Server log) Shows no errors or issues, just information. No Entries after 3:10PM.

 

5. Is it possible to use another DNS server

A: It's possible, but then we would end up with issues of internal resources stop working (but internet would work). I set it to use ONLY the DC to try and narrow down the issue.

 

6. What DNS does the DC forward onto, or is it using root hints?

PcComputerGuy_1-1596643419600.png

PcComputerGuy_2-1596643447867.png

Does that answer your question? (Forgive me, I rarely deal with Active Directory or Windows servers)

 

7. When the PC cannot resolve, can the server ping domain/IP of the client in question?

A: I'll get that info. I don't know. I think I pinged from .15 (PC2) to .14 (PC1) - but will verify when it happens again.


An incomplete answer - but I wanted to respond with the info I have/know and will get the other info as soon as it drops again. Thank you again for taking the time to respond; I really appreciate your help.

Also - No AV Software other than Windows Defender installed on any machines.

@HidMovMore complete info:

 

1. When PC1 is unable to ping google.com, can it ping the DC by IP and by name as well at the same time?

A: Cannot ping DC by domain. Can ping by netbios name, or by IP.

 

2. When PC1 is unable to ping google.com, can it ping an external IP (i.e. 8.8.8.8) during that time? (I suspect yes based on what you've already written, but good to check)

A: Yes - verified.

 

3. During the DNS/DHCP outage, can you do a NSLOOKUP to google.com if you use public DNS (i.e. 8.8.8.8) and does it resolve?

A: Yes it does (using 192.168.1.1 - the router)

 

5. Is it possible to use another DNS server

A: When adding 192.168.1.1 - then I can ping public addresses. I cannot ping the DC via domain name, I can ping via netbios name.

7. When the PC cannot resolve, can the server ping domain/IP of the client in question?

A: Yes Server can ping domain/IP of client.

@Dave PatrickThank you for taking the time to try and help. I've seen your name around the forums as I've been researching - and appreciate your time you put in to help others.

 

Here are the requested (and some additional) reports/logs/items.

https://1drv.ms/u/s!AratUArbLL6_8TenF8jW3_ovco80?e=PE1c8a

 

@Dave PatrickAlso - there is only 1 DC/server here. I don't know what happened before they called me in to replace 3 computers in January - but perhaps (inferring from logs) there was another server that no longer exists.

Thanks again for any input/suggestions you may have.

@PcComputerGuy thanks for the answers and the details - will try to take a look at this in the morning when I get a bit of time

Suzi is multi-homed plus appears there may be an IPv6 DHCP server that if not configured correctly is going to be problematic.

 

The domain controller reports kcc_ds_connect_failures failed with error 8453
https://support.microsoft.com/en-us/help/2022387/active-directory-replication-error-8453-replication...

 

I'd check the system event log for error details "Replication access was denied"

There were some other test failures that may be the result of not running with elevated credentials, but check the system event log and work to clean up all errors.

If problems persist after fixes then put up a new set of files to look at.

 

 

 

I didn't see remnants of failed or removed domain controllers but you can easily check for and remove them following along here. 

https://docs.microsoft.com/en-us/windows-server/identity/ad-ds/deploy/ad-ds-metadata-cleanup

 

 

 

@PcComputerGuy 

 

Hola !!

 

Si no tienes muchas reservas definidas en el DHCP, te recomiendo que borres el Ambito DHCP y lo vuelvas a crear, especialmente si viene de una migración de otro DHCP.

 

Un saludo,

 

best response confirmed by PcComputerGuy (Copper Contributor)
Solution

I finally ended up solving the problem.

TL;DR - The Comcast Modem (IP: 192.168.1.1 - working perfectly as the gateway, full speeds, no packet loss) ALSO had a completely useless mysterious SECOND IP address at 192.168.1.254. The odd second IP (.254) also has an odd MAC address bound to it: 00:05:04:03:02:01. 

I did so by giving up... and then preparing to wipe the system and start over from scratch. I appreciate the tips and recommendations, unfortunately none of them lead me to the solution, or were solutions I didn't feel were likely to solve the problem and would be a huge investment of time that either I ate as a loss, or provide the client with an astronomical bill for their 4 people business. 

 

In doing this careful prep I had to make sure I had a functional backup of the machine. As the client had a hard time identifying the software that they even used, let alone the supporting components; I knew I would be blamed for things being deleted - or spend the next 6 months hearing "but the files were there.." So the plan was to convert the server into a virtual machine. I could then rebuild the server from scratch; while still being able to reference (or even use until the rebuild was complete) the cloned - identical system running on the virtual machine.  That way when I was told "These files are missing, they were right here!" on the old server, I could power it up and say "nope, never were there." Further, if I failed to export some critical component, I could also go back and still export/copy the item. - This was pretty much the only way I could nuke/reload a system with so many unknown aspects; and again - it would have allowed the process to happen with 0 downtime for the client. It's quite a bit of extra work, but it would have solved so many problems.

 

Once running on the virtual machine - I had did a few tests to ensure the virtual machine was running properly. I didn't want to nuke the actual server until I was confident we were all set. This included testing the networking components. I had a persistent ping running from one of the client machines to the server. I happened to be shutting down, or rebooting the server - and out of the corner of my eye, more by accident than anything - I noticed the "server" continued to respond to every ping; even though it should have dropped a few on reboot/shutdown. 

 

Confused - I proceeded to entirely shutdown the server....... Pings still continued to respond on the server's IP address. 

 

I said out-loud (at 4AM in the empty office) "What the ....." several times. 

 

I disconnected the network physically from the VM. Pings still responded. Now... I've checked every freking item in this office multiple times for their IP's. I KNEW nothing in the office was on that IP address..... or so I thought. 

 

After disconnecting every system, every printer, every wireless access point, I was still getting a response. At this point it was the modem, switch and client PC. It wasn't the switch; I had it's IP and could log into it. But just to be safe, I bypassed the switch. Modem to PC directly. Still responding. Ovbsioly shutting down the modem stopped the responses (I mean how could it not at this point, everything else was disconnected.)

 

And sooooooo we have it. The Mother ..... Comcast modem has a ghost IP Address. The modem / gateway was indeed 192.168.1.1; as expected. It ALSO had an IP of 192.168.1.254. This is a standard ARRIS modem found in residential locations all the time. I could identify no purpose for this mysterious second IP address. It wasn't serving comcast's "free xfinity wifi" that they like to do on their modems. This functionality was not active on the modem, wifi was disabled, etc. The Mac Address being something odd, 00:05:04:03:02:01, was also a clue something was amiss. Googling around; using the mac address to form a search that would turn up something useful lead me to other people that had encountered the same mysterious secondary IP address on their modem. They had contacted comcast - to which they of course were told "we don't know". Someone thought it had to do with some Cisco firmware and WPS. 

 

I couldn't do anything about the STUPID mysterious second IP address on the modem; but I could move the server to a free, different IP address. After that....... everything immediately, and continues to work perfectly. 

 

Previous techs/companies couldn't figure it out, and I had spent literally months trying to solve this completely unexplainable issue that provided so many confusing, contradictory results in diagnostics.  To this day, I still have no clue what this second mysterious IP address is about. my over 25 years of working with tech, including years working FOR the cable company, in the modem department when they had just began rolling out, I have never seen a modem with a 2nd mysterious private IP address on the same subnet, both belonging to the modem. But alas, as we all know, having something else share/fight over the same IP address as the server, is never going to go well. 


Thanks again for those that offered suggestions - I can say that I am glad I didn't spend a significant amount of time tracking down / resolving every server error; as that didn't logically explain some of the working/not working issues. With this mysterious IP we can at least theorize that going through a switch added a touch of latency; where a computer got a faster response from the server, allowing the proper route to be created; or other things of that nature in networking.

I wanted to post this in detail to hopefully help someone else one day with similar issues. I took the time to include the bit about turning it into a virtual machine, and ensuring all was working properly before nuking the server and reinstalling from scratch - as it may provide an idea or route for others needing to wipe/reload a server they are working on for any reason; to provide more up-time, less disruption and more confidence that all the data exists and is in place. Nothing is worse than thinking you have something backed up only to later find out you didn't, or it was corrupt, or you didn't export a file you needed before wiping the machine.

It was also an epic IT tale; at least for me. I've never had an issue I *couldn't* solve to this point. Sure, there's been many where it would be too costly (in time, or hardware) and it was decided to not solve said problem -- but not having even a clue as to what's going on? That was bothersome. 

Cheers and thanks again to everyone for their shots at the issue. I truly appreciate your volunteerism. 

1 best response

Accepted Solutions
best response confirmed by PcComputerGuy (Copper Contributor)
Solution

I finally ended up solving the problem.

TL;DR - The Comcast Modem (IP: 192.168.1.1 - working perfectly as the gateway, full speeds, no packet loss) ALSO had a completely useless mysterious SECOND IP address at 192.168.1.254. The odd second IP (.254) also has an odd MAC address bound to it: 00:05:04:03:02:01. 

I did so by giving up... and then preparing to wipe the system and start over from scratch. I appreciate the tips and recommendations, unfortunately none of them lead me to the solution, or were solutions I didn't feel were likely to solve the problem and would be a huge investment of time that either I ate as a loss, or provide the client with an astronomical bill for their 4 people business. 

 

In doing this careful prep I had to make sure I had a functional backup of the machine. As the client had a hard time identifying the software that they even used, let alone the supporting components; I knew I would be blamed for things being deleted - or spend the next 6 months hearing "but the files were there.." So the plan was to convert the server into a virtual machine. I could then rebuild the server from scratch; while still being able to reference (or even use until the rebuild was complete) the cloned - identical system running on the virtual machine.  That way when I was told "These files are missing, they were right here!" on the old server, I could power it up and say "nope, never were there." Further, if I failed to export some critical component, I could also go back and still export/copy the item. - This was pretty much the only way I could nuke/reload a system with so many unknown aspects; and again - it would have allowed the process to happen with 0 downtime for the client. It's quite a bit of extra work, but it would have solved so many problems.

 

Once running on the virtual machine - I had did a few tests to ensure the virtual machine was running properly. I didn't want to nuke the actual server until I was confident we were all set. This included testing the networking components. I had a persistent ping running from one of the client machines to the server. I happened to be shutting down, or rebooting the server - and out of the corner of my eye, more by accident than anything - I noticed the "server" continued to respond to every ping; even though it should have dropped a few on reboot/shutdown. 

 

Confused - I proceeded to entirely shutdown the server....... Pings still continued to respond on the server's IP address. 

 

I said out-loud (at 4AM in the empty office) "What the ....." several times. 

 

I disconnected the network physically from the VM. Pings still responded. Now... I've checked every freking item in this office multiple times for their IP's. I KNEW nothing in the office was on that IP address..... or so I thought. 

 

After disconnecting every system, every printer, every wireless access point, I was still getting a response. At this point it was the modem, switch and client PC. It wasn't the switch; I had it's IP and could log into it. But just to be safe, I bypassed the switch. Modem to PC directly. Still responding. Ovbsioly shutting down the modem stopped the responses (I mean how could it not at this point, everything else was disconnected.)

 

And sooooooo we have it. The Mother ..... Comcast modem has a ghost IP Address. The modem / gateway was indeed 192.168.1.1; as expected. It ALSO had an IP of 192.168.1.254. This is a standard ARRIS modem found in residential locations all the time. I could identify no purpose for this mysterious second IP address. It wasn't serving comcast's "free xfinity wifi" that they like to do on their modems. This functionality was not active on the modem, wifi was disabled, etc. The Mac Address being something odd, 00:05:04:03:02:01, was also a clue something was amiss. Googling around; using the mac address to form a search that would turn up something useful lead me to other people that had encountered the same mysterious secondary IP address on their modem. They had contacted comcast - to which they of course were told "we don't know". Someone thought it had to do with some Cisco firmware and WPS. 

 

I couldn't do anything about the STUPID mysterious second IP address on the modem; but I could move the server to a free, different IP address. After that....... everything immediately, and continues to work perfectly. 

 

Previous techs/companies couldn't figure it out, and I had spent literally months trying to solve this completely unexplainable issue that provided so many confusing, contradictory results in diagnostics.  To this day, I still have no clue what this second mysterious IP address is about. my over 25 years of working with tech, including years working FOR the cable company, in the modem department when they had just began rolling out, I have never seen a modem with a 2nd mysterious private IP address on the same subnet, both belonging to the modem. But alas, as we all know, having something else share/fight over the same IP address as the server, is never going to go well. 


Thanks again for those that offered suggestions - I can say that I am glad I didn't spend a significant amount of time tracking down / resolving every server error; as that didn't logically explain some of the working/not working issues. With this mysterious IP we can at least theorize that going through a switch added a touch of latency; where a computer got a faster response from the server, allowing the proper route to be created; or other things of that nature in networking.

I wanted to post this in detail to hopefully help someone else one day with similar issues. I took the time to include the bit about turning it into a virtual machine, and ensuring all was working properly before nuking the server and reinstalling from scratch - as it may provide an idea or route for others needing to wipe/reload a server they are working on for any reason; to provide more up-time, less disruption and more confidence that all the data exists and is in place. Nothing is worse than thinking you have something backed up only to later find out you didn't, or it was corrupt, or you didn't export a file you needed before wiping the machine.

It was also an epic IT tale; at least for me. I've never had an issue I *couldn't* solve to this point. Sure, there's been many where it would be too costly (in time, or hardware) and it was decided to not solve said problem -- but not having even a clue as to what's going on? That was bothersome. 

Cheers and thanks again to everyone for their shots at the issue. I truly appreciate your volunteerism. 

View solution in original post