Slow WHILE loop on some servers

Copper Contributor

A simple issue demonstration script below. 
On some servers (mostly development, physical and VMs with 2-4 cores) it is executed ~3 seconds.
On other servers (mostly production VMs with 16-20 cores) it is executed 18-22 seconds.
Mind-blowing ~5 times slower.

A PowerShell script doing the same work shows reasonable similar execution time on all those servers.
"Mostly", but there are exceptions killing any idea we came up with.
i.e. "Enterprise and Development editions are different" - have 2 fast Enterprise servers.
i.e. "Many cores is not good" - have a fast server with 4 cores and slow server with 4 cores.
i.e. "Your physical server may have different CPU" - have a fast and slow VMs on the same ESXi box.
i.e. "Your physical server is very busy" - have  a slow (not currently loaded) VM on dedicated ESXi box.

Any idea how to troubleshot this?

Most wild observation - the usually slow server is fast during first minute after reboot.
Like another delayed-start service creates the problem. But I cannot imagine the logic of such interference ...


declare @start datetime = getdate()
declare @sum bigint = 0
declare @i int = 0
while @i < 4000000
begin
set @sum += @i
set @i += 1
end
print datediff(millisecond, @start, getdate())

5 Replies

Hi @AnatoliD -- We will probably need a bit more information in order to assist.

  • What version of SQL Server?  Same between Prod and non-Prod?
  • Are the execution plans the same between Prod and non-Prod?
  • What is the max degree of parallelism setting for each environment?

@bake13 
1. Majority are 2017, but noticed on 2019 and 2014. Yes. Majority of slow servers are prod, but I have one slow and one fast both Prod (Enterprise edition) both running on the same physical ESXi.

2. This code does not touch any table and SSMS/SqlSentry does not return actual execution plan.  Estimated execution plan is the same (COND + ASSIGNs)

3. Default  0 everywhere. The loop is single thread, it cannot be parallelized anyway.

Hi @AnatoliD --  I spent some time working with this on various platforms (virtualized, PaaS) with various versions of SQL Server running on both Windows and Linux.  I'm not really seeing any consistency or patterns with execution times.  The vast majority of the time the loop executes in less than three seconds.  Is this related to an application/database issue or is this more of a proof-of-concept?  Take care.

HI@bake13 Thanks for looking into this.
I have performance issues with a few "applications" (a set of SPs/UDFs for data processing / transformation). It runs fine on dev with 2 cores,  and not fine on prod with 16 cores.
I don't really hope it would work 8 times faster on prod, but 6 times slower - totally unacceptable.

This test is free of any other factors (disk/network IO, blocking, old statistics and not optimal execution plan, etc) which could affect performance of complex system.

@AnatoliD some update on this?

 

I found same problem.

I have two sql servers.  Old, is 2008 and win server 2008

We created new machine on same vmware host, without any cpu limiations.

Is a sql server 2019 windows server 2019.

 

A simple loop like this presents up to 1 second difference, beign new system (with no laod) more slow:

select cpu_time from sys.dm_exec_requests where session_id = @@SPID
exec sp_executesql N'
set nocount on;
declare @start_cpu datetime = getdate();
DECLARE @i bigint = 10000000
while @i > 0 set @i -= 1;
select UsedCPU = datediff(ms,@start_cpu, getdate())
'
select cpu_time from sys.dm_exec_requests where session_id = @@SPID

 

 

Also, a running a simple powershell loop comparing two servers, and new server is up 2x more slow:

 

$i = 10000000

while($i--){}