Becoming an Xperf Xpert Part 7: Slow Profile Load and Our Very First Stack Walk
Published Sep 20 2018 01:09 AM 1,510 Views

First published on TechNet on Feb 02, 2014

Hi my name is Randolph Reyes (Randy) with my first blog contribution! I am Premier Field Engineer specializing in Active Directory and Windows Desktop. Today I will like to share a recent trace taken from a Windows 7 system.

In this particular engagement WD RAP as a Service+ we detected almost all the Windows 7 system (7,000 Computers) experiencing boot time not with in good values using traditional hard-drive (non-SSD). Here is an article by Yong Rhee that explain RAP as a Service for those that are interested in getting one scheduled for yourself.

Boot time good value traditional hard-drive (non-SSD)

PreSMSS

SMSSInit

WinlogonInit

ExplorerInit

Post Boot

10 Seconds

10 Seconds

30 to 45 Seconds

10 Seconds

30 Seconds

Boot time good value SSD (Solid State Drive)

PreSMSS

SMSSInit

WinlogonInit

ExplorerInit

Post Boot

5 Seconds

5 Seconds

30 Seconds

5 Seconds

10 Seconds

The previous boot time values are my personal experience and not specific times by Microsoft .

 

The Before

PreSMSS

SMSSInit

WinlogonInit

ExplorerInit

Post Boot

3.480

10.230

67.380

33.642

69.000

Boot to Post Boot Activity ended: 183 Seconds and 734 Milliseconds = 3 Minutes and 3 Seconds

After adding the graph for Boot Phases and also Generic Event we can see that most of the Winlogon Initiation Phase time is been consume by the Subscriber Profiles under the Microsoft-Windows-Winlogon provider. In the picture below we can see when the Subscriber Profile started at 31.603 and ended at 79.518 of the trace.

First question you need to ask yourself. Why Profile is taking that long load?

Are we using Roaming Profiles or do we have a local profile in this Windows 7 System?

What’s going on with my Profiles?

 

To be able to understand what’s going on we need to turn to “Stack Walking” (Editors note this is not easy or for the faint of heart. If you don’t understand it first go around that's OK. Keep working on it. It requires knowledge of Windows Internals . Hopefully as we do more posts on this subject it will become easier. Stick with it. This is an excellent skill to have. –Mark Morowczynski)

What is Stack Walking?

Here is an article by Bruce Dawson that explain more in depth “Stack Walking in Xperf” http://blogs.msdn.com/b/pigscanfly/archive/2009/08/06/stack-walking-in-xperf.aspx

We need to make note of the Process “Winlogon (732)” and Thread (744) so we can do wait analysis against this thread to see what is taking almost 48 seconds.

Also make sure to click in the Trace tab to Load Symbols, if you don’t load the symbols you will see the call stack recorded, but the details of what the code was executing will be hidden. (Call stacks will show DLL names followed by question marks).

Taking the process and thread information, we’ll use it in the “CPU Usage Precise” graph (based on Utilization by Process, Thread).

I recommend the order NewProcess, NewThreadID, NewThreadStack, Wait (us), ReadingProcess, ReadyingThreadID. SwitchInTime(s). These can be added or removed from the Line # menu.

After adding the graph “CPU Usage Precise” graph (based on Utilization by Process, Thread) we expanded process Winlogon (732) and inside this process we follow the thread (744)

Remember we are looking for a delay in time of 48 seconds, we can see the CallNotificationSubscriber with a single call that took 47 seconds (Profiles)

 

Here we can see that NotificationSubscriber is waiting in Svchost.exe and WmiPrsVE.exe operations. This is interesting because within a total of 47 seconds, 19 seconds where used within the module svchost.exe ( Issue # 1 ) and 26 seconds within the module WmiPrvSE.exe ( Issue # 2 ) To continue we need to troubleshoot one issue at a time.

Alright let’s start troubleshooting issue # 1 Svchost.exe.

We can observe that the switching time in svchost.exe (592) was 79.518 seconds (exactly the time “Profiles” ended, so we are on the right track.

Using the same graph “CPU Usage Precise” graph (based on Utilization by Process, Thread) we expanded process svchost.exe (592) and inside this process we follow the thread (1732)

When “walking” the thread, please take into consideration new total time of wait ( 19.097 seconds ). We need to understand why we are waiting 19 seconds.

Looks like the profsvc.dll (Provides User Profile Services service = Data and Components) waited 19 seconds for another process Svchost.exe operation. Now, we need to follow the thread 1648, inside the svchost.exe (592).

 

Looking at the modules involved, and functions we waited 14 seconds for another process Svchost.exe doing some kind of network communication operation.

Now, we need to follow the thread 940, inside the svchost.exe (928)

When “walking” the thread, please take into consideration new total time of wait ( 14.512 seconds ). We need to understand why we are waiting 14 seconds.

 

Looks like we are still waiting on some kind of network communication operation ( rpcrt4.dll is the Remote Procedure Call (RPC) API, used by Windows applications for network and Internet communication ).

Now, we need to follow the thread 1064, inside the svchost.exe (848)

When “walking” the thread, please take into consideration total time of wait ( 14 seconds ). We need to understand why we are waiting 14 seconds.

At this point I know what you are thinking, this is just keeps going and going like the energizer bunny, doesn’t it?

Don’t worry we are going to get the bottom of this.

Now we are getting somewhere, looks like svchost.exe waited 14 seconds for services.exe ( Called Service Control Manager, it runs in the background and launches other applications that are needed to run during startup and shutdown. It's also in charge of programs installed on your computer that run automatically or as needed )

Now, we need to follow the thread 2180, inside the services.exe (676)

When “walking” the thread, please take into consideration total time of wait ( 14 seconds ). We need to understand why we are waiting 14 seconds.

 

Looks like we are waiting for services.exe to start. Now, we need to follow the thread 680, inside the services.exe (676)

When “walking” the thread, please take into consideration total time of wait ( 14 seconds ). We need to understand why we are waiting 14 seconds.

Finally looking at all the modules involved, and the functions, within the total wait time of ( 14 seconds ), we have a delay of 12 seconds loading a driver.

Alright at this point you are thinking wow 12 seconds for a driver to load definitely something is wrong here.

Take into consideration the new total time of wait ( 12 seconds ) and also SwitchInTime(s) 42.862.

Now we need to find which driver took 12 seconds to load and ended around 42.862 of the trace.

After adding the graph “Generic Events” (based on Activity by Provider, Task, Opcode) we expand the provider Microsoft-Windows-Kernel-PnP and inside we expand Task Name DriverLoad.

I recommend the order ProviderName, TaskName, Golden Bar, OpcodeName, ID and Process. These can be added or removed from the Line # menu.

 

In the image above we can see mfeavfk.sys started at 30.803 and ended at 42.862 of the trace. If we subtract those numbers we get our 12 seconds delay. We did it!

Doing a little more research we confirmed that the driver mfeavfk.sys belongs to the company SYSCORE /McAfee.

First step to take is to check which file version of mfeavfk.sys we have installed in the Windows 7 System.

After adding the graphic “Images” graphic (based on Transient Lifetime by Process, Image) we expand process idle (0) and inside this process we look for the file version mfeavfk.sys

I recommend the order Process, ImageName, TimeDateStamp, File Version, and Product Version. These can be added or removed from the Line # menu.

The current file version of mfeavfk.sys 14.4.0.457 is not the latest. After some research the latest version of this file is 15.0.0.539

Alright let’s start troubleshooting issue # 2 WmiPrvSE.exe.

First step to take is to check which file version of Windows Management Instrumentation components we have installed in the Windows 7 System.

After adding the graphic “Images” graph (based on Transient Lifetime by Process, Image) we expand process WmiPrvSE.exe (3692) and inside this process we look for the file version WMIPerfClass.dll

I recommend the order Process, ImageName, TimeDateStamp, File Version, and Product Version. These can be added or removed from the Line # menu.

The current file version of WMIPerfClass.dll is not the latest. Updated version of this file 6.1.7601.22137 can be located here http://support.microsoft.com/kb/2775511/en-us

The above Hotfix Rollup will improve the Windows Management Instrumentation (WMI) components to reduce the CPU usage and to improve the repository verification performance.

 

The Fixes

Step 1 = McAfee Antivirus Upgrade and Update to latest version.

Step 2 = Installation update http://support.microsoft.com/kb/2775511

 

The After

Boot to Post Boot Activity end: 96 Seconds and 978 Milliseconds = 1 Minutes and 36 Seconds

(Note: In this particular engagement we still have other areas that will improve boot time in this particular systems but those will be posted in another blog posts. Check back with us later.)

This engagement was a huge success ! Not only did we find the root causes of the long Profile delay, but immediately an action plan was created to make same changes in all the Windows 7 at the Enterprise level.

After applying this changes we got 50% improvement in boot time saving the company time and money.

Before

7,000 Computers x 3 minutes = 21,000 minutes

After

7,000 Computer x 1 minute and 30 seconds = 10,500 minutes

These are great hard numbers you can take back to management to show what type of improvements you are making.

Other Recommended Articles

There are so many great blogs out there that is impossible to add them all here. Here are some of those blogs that I learned a lot from.

WPT: Installing the Windows Performance Toolkit v5.0 (WPRUI, WPR, Xperf)
http://blogs.technet.com/b/yongrhee/archive/2012/11/23/installing-the-windows-performance-toolkit-v...

WPT: WPR/Xperf: Capture high cpu, disk i/o, file, registry, networking, Private bytes, Virtual bytes, Paged Pool/Nonpaged pool and/or application slowness.
http://blogs.technet.com/b/yongrhee/archive/2012/11/23/wpr-xperf-capture-high-cpu-disk-i-o-file-reg...

Slow Boot Slow Logon (SBSL), A Tool Called XPerf and Links You Need To Read
http://blogs.technet.com/b/askpfeplat/archive/2012/06/09/slow-boot-slow-logon-sbsl-a-tool-called-xp...

 

Special thanks to Yong Rhee, Mark Morowczynski, Antonio Vasconcelos, Jeff Stokes and Charity Shelbourne for all your help.

 

Randy “Saving you time and money” Reyes

Version history
Last update:
‎Feb 19 2020 08:24 PM
Updated by: