First published on MSDN on Jan 12, 2018
Recently, we have observed a number of cases where DBAs or application developers are complaining about out-of-memory errors or even machine not responding (hangs) despite the fact that there is plenty of available memory on the system. Frequently this is on systems where SQL Server is running and the issue occurs after increasing the ' max server memory ' sp_configure option, again with plenty of memory available. For example on a system with 64-GB RAM, SQL Server's 'max server memory' is set to 54 GB and you start getting OS errors 1453 or 1450 for example or any other memory-related issue. In some cases the OS stops responding. The key symptom was that when the DBA reduced the SQL Server max server memory below 50 GB (below 80% of RAM on the machine), the problem stopped happening. So logically, you would think there is a problem with SQL Server.
In this one case, the machine where SQL Server was running was experiencing "hangs" - lack of response. One of our colleagues from the Windows team (kudos Syed Yusuf) analyzed a manually-generated kernel memory dump.
3: kd> !mex.mem
Page File: \??\D:\pagefile.sys
Current: 68157440 Kb
Free Space: 68000152 Kb ( 65.00 GB)
Minimum: 68157440 Kb
Maximum: 68157440 Kb ( 65.00 GB)
Physical Memory: 16596487 ( 66385948 Kb) ( 63.31 GB)
Available Pages: 2254327 ( 9017308 Kb) ( 8.60 GB)
ResAvail Pages: 1118 ( 4472 Kb) ( 4.37 MB)
- Available memory (available pages) looks good.
- ResAvail Pages are the Minimum Working Set that process reserves/set while starting. The value of ResAvail is 4.73 MB which is too low. Even though overall available memory is 8.6 GB, there’s some process which has set Min Working Set to a huge value. This makes available pages reserved to this process and cannot be allocated to anything else.
********** Running out of physical memory **********
Locked IO Pages: 0 ( 0 Kb) ( 0)
Free System PTEs: 4294986002 (17179944008 Kb) ( 16.00 TB)
******* 3 system PTE allocations have failed ******
Modified Pages: 18188 ( 72752 Kb) ( 71.05 MB)
Modified PF Pages: 18181 ( 72724 Kb) ( 71.02 MB)
Modified No Write Pages: 0 ( 0 Kb) ( 0)
NonPagedPool 0 Used: 41 ( 164 Kb) ( 164.00 KB)
NonPagedPoolNx 0 Used: 18074 ( 72296 Kb) ( 70.60 MB)
NonPagedPool 1 Used: 66 ( 264 Kb) ( 264.00 KB)
NonPagedPoolNx 1 Used: 17231 ( 68924 Kb) ( 67.31 MB)
NonPagedPool Usage: 409 ( 1636 Kb) ( 1.60 MB)
NonPagedPoolNx Usage: 68905 ( 275620 Kb) ( 269.16 MB)
NonPagedPool Max: 4294967296 (17179869184 Kb) ( 16.00 TB)
PagedPool 0 Usage: 58073 ( 232292 Kb) ( 226.85 MB)
PagedPool 1 Usage: 21277 ( 85108 Kb) ( 83.11 MB)
PagedPool 2 Usage: 21274 ( 85096 Kb) ( 83.10 MB)
PagedPool Usage: 100624 ( 402496 Kb) ( 393.06 MB)
PagedPool Maximum: 4160749568 (16642998272 Kb) ( 15.50 TB)
********** 5625 pool allocations have failed **********
Session Commit: 9360 ( 37440 Kb) ( 36.56 MB)
Shared Commit: 55767 ( 223068 Kb) ( 217.84 MB)
Special Pool: 0 ( 0 Kb) ( 0)
Shared Process: 24208 ( 96832 Kb) ( 94.56 MB)
Pages For MDLs: 656 ( 2624 Kb) ( 2.56 MB)
Pages For AWE: 12704397 ( 50817588 Kb) ( 48.46 GB)
NonPagedPool Commit: 70902 ( 283608 Kb) ( 276.96 MB)
PagedPool Commit: 100624 ( 402496 Kb) ( 393.06 MB)
Driver Commit: 11318 ( 45272 Kb) ( 44.21 MB)
Boot Commit: 196761 ( 787044 Kb) ( 768.60 MB)
System PageTables: 1593 ( 6372 Kb) ( 6.22 MB)
VAD/PageTable Bitmaps: 11287 ( 45148 Kb) ( 44.09 MB)
ProcessLockedFilePages: 0 ( 0 Kb) ( 0)
Pagefile Hash Pages: 86 ( 344 Kb) ( 344.00 KB)
Sum System Commit: 482562 ( 1930248 Kb) ( 1.84 GB)
Total Private: 14023786 ( 56095144 Kb) ( 53.50 GB)
Misc/Transient Commit: 24642 ( 98568 Kb) ( 96.26 MB)
Committed pages: 14530990 ( 58123960 Kb) ( 55.43 GB)
Commit limit: 33635847 ( 134543388 Kb) ( 128.31 GB)
System Region Base Address NumberOfBytes
NonPagedPool : ffffd18000000000 100000000000
PagedPool : ffffa40000000000 f8000000000
SystemCache : ffffe18000000000 100000000000
SystemPtes : ffff900000000000 100000000000
UltraZero : ffffba0000000000 100000000000
CFG : ffffca0000000000 60000000000
Hyperspace : 0 8000000000
SpecialPool : ffffa00000000000 10000000000
SessionSpace : fffffb8000000000 8000000000
PagedPoolWorkingSet : ffffa20000000000 8000000000
SystemCacheWorkingSet : ffff890000000000 8000000000
System Images : fffff50000000000 8000000000
Virtual Memory Physical Memory File Cache Cache Writes
3: kd> !minws
Process name msmdsrv.exe Min WS: 12.66 GB ---------------> MSMDSRV has set Min WS to be 12.66 GB
Total working set size for all processes: 4.13 GB
3: kd> !mex.p ffffd188d8c1b080
Name Address Ses PID User Name Create Time Up Time Mods Handle Act Thrd Z Thrd Parent
=========== ================ === ============ ===================== ========================== ============ ==== ====== ======== ====== =========================
msmdsrv.exe ffffd188d8c1b080 0 e90 (0n3728) DOMAIN\ACCOUNT$ 07/03/2017 08:48:16.035 PM 9h:49:55.964 82 1824 100 0 services.exe 3f8 (0n1016)
Command Line: "C:\Program Files\Microsoft SQL Server\MSAS13.POWERPIVOT\OLAP\bin\msmdsrv.exe" -s "C:\Program Files\Microsoft SQL Server\MSAS13.POWERPIVOT\OLAP\Config"
Memory Details:
VM Peak Work Set Commit Size PP Quota NPP Quota
========= ======= ======== =========== ========= =========
852.57 MB 1.84 GB 57.73 MB 117.86 MB 571.65 KB 64.37 KB
CPU Details:
User Kernel Total
===== ====== =====
328ms 298ms 626ms
The same problem , but not involving SQL Server or Analysis Services, was described in this blog .
The big surprise (gotcha) came from the fact that Performance Monitor counters showed plenty of available physical memory and working set sizes for all processes were quite reasonable. So don't expect to find this in Perfmon, it just isn't there.
Update (Nov 2018):
Another team of engineers analyzed the issue and proposed a much simpler, non-intrusive method to diagnose the Working Set memory for the Analysis Services process (thanks Venu Cherukupali).
- Download Process Explorer
- Start it under an Administrator account (Right-click on procexp64.exe, "Run as Administrator")
- From top-level menu choose View -> Select Columns... -> Process Memory tab
- Check Minimum Working Set and Maximum Working Set check-boxes
- Click OK.
- Locate the MSMDSRV.EXE process in the list
- Examine the Minimum Working Set values
Here is an example of the Minimum Working Set value of SSAS on a 64-GB RAM system.
Solution
The Analysis Services experts joined the t-shooting (thank you, Yinn Wong) and as it turns out that by default the SSAS tabular instances, sets minimum working set to 20% of the physical memory on the system. The system was using POWERPIVOT SSAS.
Did this: In most cases it turned out SSAS was not even needed and used on that system. Therefore, in those cases that service was stopped or better yet uninstalled.
Of course, the idea behind Tabular SSAS setting a minimum working set size off the bat is that it requires a lot of memory to begin with. Therefore, if SSAS is indeed installed on the same system where SQL Server is running, that is probably not the best resource-allocation decision. SSAS would be much better served on a dedicated system with lots of RAM.
For more information see VertiPaqMemoryLimit in this white paper ( https://msdn.microsoft.com/en-us/library/jj874401.aspx ) and doc about setting SSAS memory for tuning https://docs.microsoft.com/en-us/sql/analysis-services/server-properties/memory-properties
Namaste!
Joseph