Autopilot, Co-Management and ESP Timeouts (and BITS too)

Copper Contributor

I'm working with a customer who has been having a great deal of issues with their Autopilot implementation.

 

Basically, the ESP page will timeout with one of these two error codes:

0x800705b4

0x00000004

 

This shows as an error to the end-user, and is unacceptable 

 

The interesting thing was that, if you left the device for an hour, then hit continue (that option is set in the ESP properties), then everything will have installed correctly, and the device is compliant.

 

So why would the ESP page fail with a timeout way before the 6 hour limit I had set?

The 0x800705b4 Error

 

The first thing I discovered is that all this is related to co-management and the CM client install. 

Getting the first timeout: 0x800705b4 I noticed the following two value's values were strange:

HKLM\Software\Microsoft\Windows\Autopilot\EnrollmentStatusTracking\Device\DevicePreparation\PolicyProviders\ConfigMgr

 


 Updating Media

 

 

  • InstallationState would be 1 (Installing) or 4 (Failed) (It's 1 if the client is still installing, it will switch to 4 if the install fails or the registration fails)
  • InstallDuration would be 1800 (not 1801, nor 598, nor 1799, ALWAYS 1800)

     

    If I looked at the CCMSetup.log file it would show that the installation was successful, and the return code would be 0, as expected. 

     

    But on looking at start time and end times on the logs, I noticed the install was taking over 40 minutes to complete. 

     

    In this case it was 27 minutes but the ESP still failed? 

    This is becaause the client, although installed, has not registered

    I'll go into this more in the 0x00000004 error below.

     

    But is this case the combined install time of 27 minutes to install and 10 minutes to register took the install time over 1800 seconds and so it failed with the 0x800705b4 error.

     

    If the combined times (say 500 for the install and 600 for the registration, which comes to 1100) is less than 1800, but the registration fails 6 times taking 10 minutes, then we get the 0x00000004 error

     

    On further investigation of this log, I found that I would occasionally get a BG Error Context is 5 notification, and that the download of the client would be divided into multiple 5-minute segments, with each segment downloading only around 28MB. Which meant the net time to download the binaries would be 40 minutes or more. Now given that 1800 seconds = 30 minutes, I realized that the ESP had a second timeout, totally unrelated to the one set in the properties within Intune. 

    Basically, if the CM client did not install within the 30 minutes, the ESP failed, regardless of whether the install was finished or not. 

    So when linking this with the BG Error Context is 5, I realized that this was a BITS issue, I have no idea why BITS would be throttled during an Autopilot install process, but it was.

     

    The solution was to add the following command line option to the Co-Management properties. 

    /BITSPriority:FOREGROUND.

    This effectively removes the limitations on the BITS download, and the entire installation time sunk to less than 3 to 10 minutes, depending on Internet bandwidth. 

     

    This was a great win, and with it, the 0x800705b4 disappeared. 

     

    The 0x00000004 Error

    Only to be replaced by the 0x00000004 error (I might have the number of 0s wrong here, I'm working from memory).

     

    The 0x00000004 Error

     

    As with the previous error, this was not a constant, it would happen, then it wouldn't, then it would again. But it happened enough to be a problem.

     

    I looked at the registry again and noticed this:

     

    • InstallationState would be 4 (failed)
    • InstallDuration would be LESS THAN 1800

       

      I kept digging. The answer was found in the ClientIDManagerStartup.log log.

       

      Once the CM Client is installed, the next step is to register it with an MP and get it into the MECM database. As shown by the image below, if it fails, it sleeps 60 seconds, and tries again, then another 60, then 120, then another 120, and then 240 and finally a final 240 (10 minutes in total) at which point it fails, and if you look at the last line in the daigram, it sets the ConfigMgr Install state to 4, which fails the ESP, this time with the 0x00000004 error.

       

       

      As with the previous error, this doesn't mean anything actually broke. It just means it didn't occur with the allowed timeframe (this time 10 minutes).

       

      The difference between the two errors is simply that in the first one the combined time to install the CM Client and the 10 minutes of sleeps in the ClientIDManageStartup log exceed 1800, whereas in the second, they do not.

      The important note, is that they both register the 10 minutes of sleeping and fail the ESP.

       

      So, if you factor 10 minutes to install the CM Client and another 10 attempting to register, this failure can occur after only 20 minutes, way less than the 3600 minutes set in the ESP.

       

      So why is the client not registering immediately?  That is the million-dollar question. and as far as I have got with this issue. 

       

      I looked to the MPs to see if there was a bottleneck in the outboxes, returning the acknowledgment that the client is registered, but couldn't see anything out of the ordinary. 

       

      Another thing to note, is that if you have a Provision TS running, it will kick off immediately after the ClientIDManager fails the ESP. So any apps being delivered via this task sequence will install and the device will complete successfully (other than the ESP error).

       

      Final Thoughts

      What is interesting (or frustrating, depending on your viewpoint) is that Autopilot and the ESP really doesn't actually do anything (except a few things like setting user install type and making ConfigMgr a 1st app, to ensure it takes the MDM authority, preventing ESP from ending until certain apps are installed.  The rest is controlled by Intune or ConfigMgr natively. If you don't assign blocked apps to users or groups, they don't get installed that's it. Same with policies and profiles etc., but by the same process, if the ESP (and by extension Autopilot) fails, it doesn't mean everything stops installing. It continues installing everything and provided there are no actual issues it finishes successfully which makes it very hard to troubleshoot. The first thing I do when I start troubleshooting is look to see what didn't get installed/applied. In this case everything gets installed, everything gets installed.

      The only error is the ERROR ITSELF. In a prefect world, if I have set the timeout for 6 hours and the CM client takes 5 hours to install, then there should be no error. Same with registering the client. If it takes 2 hours of 'sleeping' to get it registered, who cares? I have 6 hours to play with.

      Obviously we don't want things to take that long, but we all know that MECM is not the fastest moving software in the world, before it was Intune Configuration Manager, and Microsoft Endpount Configuration Manager (MECM) etc., it was knows as SMS which was affectionately renamed Slow Moving Software.

       

      Paul_Isaac_0-1692882147641.png

    Paul_Isaac_6-1692884305894.png

    Paul_Isaac_5-1692884037344.png

    Paul_Isaac_4-1692884009810.png

Paul_Isaac_1-1692883670037.png

3 Replies

@Paul_Isaac the issue is related to client registration. Once the agent installed, the device management authority is expected to switch to ConfigMgr and the co-management workloads will apply. Since the registration is failing in your case, the ESP is unable to progress. Has this worked before? Is client registration working through CMG outside the AP process? Also, which enrollment method are you using? Pre-provisioning has limitations when it comes to installing ConfigMgr agent through the built-in Intune feature. 

The thing is the registration does not fail, it just doesn't complete in 10 minutes and the ESP fails.
This is not my site, it is a customer's, and they claim they have sleeps up to 15 minutes during OSD, but as in this case there is no ESP, not ESP fails.
Thank you for detailing what I've seen in my environment. The fact there is no timeout/retry adjustment is a flaw. On a low latency/fast internet connection the problem is diminished.
However on a slow internet connection, this is going to be a problem. I've found the weakest link in the work from home technology chain is the lack of understanding on the part of the home owner how important keeping their router/modem (with family safety turned on, etc..) up to date.