Aug 24 2018 02:00 AM - edited Aug 24 2018 07:26 AM
We're facing a problem of very slow performance with an S2D three node cluster
Lenovo SR650 7X05
Windows Server 2019 17738
384Gb RAM
4 HDD 6TB ST6000NM0115
2 NVMe 900GB PX04PMB096
2 Mellanox ConnectX-4
The only clue is this "Threshold Exceeded" warning on two of the NVMe physical disks:
Get-PhysicalDisk
DeviceId FriendlyName SerialNumber MediaType CanPool OperationalStatus HealthStatus Usage
-------- ------------ ------------ --------- ------- ----------------- ------------ -----
0 ThinkSystem M.2 VD b344a54d77360010 Unspecified False OK Healthy Auto-Select
2002 ATA ST6000NM0115 ZAD29F2X HDD False OK Healthy Auto-Select
2005 PX04PMB096 8CE3_8E07_0503_B800. SSD False {OK, Threshold Exceeded} Healthy Journal
3006 PX04PMB096 8CE3_8E07_0503_D600. SSD False {OK, Threshold Exceeded} Healthy Journal
3004 ATA ST6000NM0115 ZAD29YNC HDD False OK Healthy Auto-Select
3002 ATA ST6000NM0115 ZAD29V4X HDD False OK Healthy Auto-Select
2006 PX04PMB096 8CE3_8E07_0503_C000. SSD False OK Healthy Journal
1006 PX04PMB096 8CE3_8E07_0503_D200. SSD False OK Healthy Journal
3003 ATA ST6000NM0115 ZAD29S6W HDD False OK Healthy Auto-Select
1004 ATA ST6000NM0115 ZAD29SE4 HDD False OK Healthy Auto-Select
2004 ATA ST6000NM0115 ZAD29VAR HDD False OK Healthy Auto-Select
2003 ATA ST6000NM0115 ZAD29SMK HDD False OK Healthy Auto-Select
3001 ATA ST6000NM0115 ZAD29YL6 HDD False OK Healthy Auto-Select
2001 ATA ST6000NM0115 ZAD29YK2 HDD False OK Healthy Auto-Select
1003 ATA ST6000NM0115 ZAD29Z0A HDD False OK Healthy Auto-Select
1001 ATA ST6000NM0115 ZAD06HEJ HDD False OK Healthy Auto-Select
3005 PX04PMB096 8CE3_8E07_0503_D100. SSD False OK Healthy Journal
1005 PX04PMB096 8CE3_8E07_0503_D300. SSD False OK Healthy Journal
1002 ATA ST6000NM0115 ZAD29YQ5 HDD False OK Healthy Auto-Select
Testing the disks with Lenovo SSD tool I see that the Life Remaining of the disks is still at 99%:
(c)Copyright Lenovo 2016.
Portions (c)Copyright IBM Corporation.
SSDCLI -- Display SMART Info v:7.3.2[Tue Jan 9 15:24:37 2018]
-------------------------------------------------------------------------
1 PN:00YK145-01GT679 SN:S3YAM3EN FW:300BT11L
Number bytes written to SSD: 98028.9GB
Number bytes supported by warranty: 17520000GB
Life Remaining Gauge: 99%
SSD temperature:22(c) Spec Max: 70(c)
PFA trip: No
Warranty Exceed:No
2 PN:00YK145-01GT679 SN:S3YAM3FN FW:300BT11L
Number bytes written to SSD: 96597.2GB
Number bytes supported by warranty: 17520000GB
Life Remaining Gauge: 99%
SSD temperature:22(c) Spec Max: 70(c)
PFA trip: No
Warranty Exceed:No
2 Device(s) Found(SATA:0 SAS:0 NVME:2)
Any help is really appreciated
Alex
Aug 27 2018 09:40 AM
Thanks for the report. What build are you using?
Steven Ekren
Senior Program Manager
Windows Server
Microsoft
Aug 27 2018 10:54 AM
Could you let us know what workload you are running? Also, can you email us the output of Get-SDDCDiagnosticInfo (trying to see what your configuration is to root cause)? (I will dm you my email)
Adi Agashe
Program Manager
Windows Server
Microsoft
Aug 29 2018 05:43 AM
I've sent you what requested.
We have 52 VMs, with mixed operating system that goes from windows Server 2008 R2 to 2016 and some Ubuntu Linux.
The whole size is 6TB of data.
Thanks
Alessio