Persistent Chat room lock-up and become unavailable when a user is either added/removed from the Room/Category
Published May 20 2019 05:26 PM 11.2K Views
Microsoft

First published on TECHNET on Feb 16, 2018

The latest update for Lync Server 2013 ( July 2017 ) has the following fix



 



The update for Skype for Business Server 2015 ( May 2017) has the following





It could happen that though the updates are installed ( or a higher CU) is installed, the issue could persist in the environment.

 

 


 

Log Name:      Lync Server
Source:        LS Persistent Chat Server
Date:          10/10/2017 1:01:02 PM
Event ID:      53508
Task Category: (1098)
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      PCHATServer.contoso.com
Description:
Failed to release the admin lock. Administrative command processing cannot proceed.


Log Name:      Lync Server
Source:        LS Persistent Chat Server
Date:          10/10/2017 5:44:36 AM
Event ID:      53555
Task Category: (1098)
Level:         Warning
Keywords:      Classic
User:          N/A
Computer:      PCHATServer.contoso.com
Description:
An inconsistent state between the server cache and the database was detected and the server cache will be reloaded.

 

The Persistent Chat server will reload its cache from the database. 


Cause: This can be caused by Persistent Chat servers failing to communicate with each other.


Log Name:      Lync Server
Source:        LS Persistent Chat Compliance Server
Date:          10/8/2017 1:29:31 PM
Event ID:      53106
Task Category: (1097)
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      PCHATServer.contoso.com
Description:
Unable to save message 10/8/2017 8:24:59 PM PART ma-chan://contoso.com/6f41dceb-69ae-434a-9699-123e8eb5f675  0 39000 to database due to exception:
CmdID: c5409a64-b11d-4d49-90f5-fa694cd4555f The server could not restore db connection within the allowed time (00:10:00) using connection string: Data Source=sql01.contoso.com\RTC;Initial Catalog=mgccomp;Integrated Security=SSPI;Failover Partner=sql02.contoso.com\RTC. at

at Microsoft.Rtc.Internal.Chat.Server.ServerCommon.Database.DbCommand.executeUntilSuccessOrTimeout[TR](Fun`2 executeDelegate, RetryInfo retryInfo)

at Microsoft.Rtc.Internal.Chat.Server.ServerCommon.Database.DbCommand.executeImp[TR](Fun`2 executeDelegate, Int32 retryTimeoutInMs)

at Microsoft.Rtc.Internal.Chat.Server.ServerCommon.Database.DbCommand.ExecuteNonQuery(Int32 retryTimeoutInMs)

at Microsoft.Rtc.Internal.Chat.Server.Compliance.ComplianceDataAccess.Save(RawComplianceData data)

at Microsoft.Rtc.Internal.Chat.Server.Compliance.ComplianceServer.Save(RawComplianceData data).


This issue stems from design and from scalability. When Persistent Chat servers were designed it wasn't expected that users would be removed/added on continual basis. Also to ensure that only participants who are in the chat room have access, even though a single user was added/removed, we verify the permissions for every user and every category and every chatroom. This works well in small environments, but as the usage scales, the solution fails to scale. Now, about the trade-off, we added a new flag that can be modified to change the behavior, where no checks are performed and the actions are simply implemented. What does it mean in daily usage, if a user was removed from say a chat room, under the current scenario, the chat room access is also removed from the client immediately. The trade-off that businesses will now have to make is for performance, and to prevent SQL lock-ups, that may have to wait for a client to sign-out and sign-in, causing access to the chat room to be revoked.

 

RESOLUTION:


Connect to MGC database in your environment and then get me the contents of the dbo.tblConfig table. It should be like

 



 

configLabel         configPoolID                                                                  configContent

 



 

pool                     9CFB3493-89B2-447C-8487-9C19C13E1694           < ?xml version="1.0"....

 



 

We are interested in the ConfigContent. It should look like

 



 

<?xml version="1.0" encoding="utf-8" standalone="yes"?>

< configuration version="1">

< pool>

< db>

< retry_ms>600000</retry_ms>

< lossdetection_ms>120000</lossdetection_ms>

< /db>

< channelserver>

<ADConnect>

< GlobalCatalog>

< findgc>True</findgc>

< host></host>

< adsynchfreq>480</adsynchfreq>

< /GlobalCatalog>

< /ADConnect>

< adupdate>

< batchsize>5000</batchsize>

< sleeptime_ms>10000</sleeptime_ms>

< accesspoll_ms>604800000</accesspoll_ms>

< accesspoll_size>50</accesspoll_size>

< accesspoll_enabled>False</accesspoll_enabled>

< /adupdate>

< serverbackchat>

< cache_size_limit>2500000</cache_size_limit>

< /serverbackchat>

< watermarks>

< batch_message_count_max>20</batch_message_count_max>

< async_send_max>100</async_send_max>

< async_send_max_lo>90</async_send_max_lo>

< outbound_queue_max>100000</outbound_queue_max>

< outbound_queue_max_lo>90000</outbound_queue_max_lo>

< low_priority_queue_max>500</low_priority_queue_max>

< inbound_queue_size_max>10000</inbound_queue_size_max>

< channelinvitemax>50</channelinvitemax>

< /watermarks>

< /channelserver>

< webservice>

< maxchunksizeinkb>1024</maxchunksizeinkb>

< /webservice>

< /pool>

</configuration>

 

Please see highlighted section in bold. We will need to edit the contents to insert the line <notify_users>0</notify_users> at that particular location. Once this is done, we would recommend to restart the services for PCHAT.

 

<?xml version="1.0" encoding="utf-8" standalone="yes"?>

<configuration version="1">

<pool>

<db>

<retry_ms>600000</retry_ms>

<lossdetection_ms>120000</lossdetection_ms>

</db>

<channelserver>

<ADConnect>

<GlobalCatalog>

<findgc>True</findgc>

<host></host>

<adsynchfreq>480</adsynchfreq>

</GlobalCatalog>

</ADConnect>

<adupdate>

<batchsize>5000</batchsize>

<sleeptime_ms>10000</sleeptime_ms>

<accesspoll_ms>604800000</accesspoll_ms>

<accesspoll_size>50</accesspoll_size>

<accesspoll_enabled>False</accesspoll_enabled>

</adupdate>

<serverbackchat>

<cache_size_limit>2500000</cache_size_limit>

<notify_users>0</notify_users>

</serverbackchat>

<watermarks>

<batch_message_count_max>20</batch_message_count_max>

<async_send_max>100</async_send_max>

<async_send_max_lo>90</async_send_max_lo>

<outbound_queue_max>100000</outbound_queue_max>

<outbound_queue_max_lo>90000</outbound_queue_max_lo>

<low_priority_queue_max>500</low_priority_queue_max>

<inbound_queue_size_max>10000</inbound_queue_size_max>

<channelinvitemax>50</channelinvitemax>

</watermarks>

</channelserver>

<webservice>

<maxchunksizeinkb>1024</maxchunksizeinkb>

</webservice>

</pool>

</configuration>

For user removals, it could be possible that you could run Revoke-csClientCertificate for the removed user, and the user will be signed-out from all end-points that do not use UCWA. They can then sign-in and continue using the service. This commandlet may disrupt the calls and conferences or IM conversations the user is on.

 

 

Please check your business requirements and the available trade-offs to decide if you want to proceed with altering the configuration. Also note that a service restart is required.

Version history
Last update:
‎May 28 2019 03:17 PM
Updated by: