Poison Message Handling

Former Employee

Mar 23, 2019

First published on MSDN on Jun 30, 2008

Service broker provides automatic poison message detection, which disables the service queue upon five consecutive transaction rollbacks in trying to receive messages from the queue. However, an application should not rely on this feature for normal processing. Because automatic poison message detection stops the queue, this halts all processing for the application until the poison message is removed. Instead, an application should attempt to detect and remove poison messages as part of the application logic.

Typically, the message processing application can choose among the following four strategies:

A. Returns the poison message back to the service queue by rolling back the transaction in hope that the message can be successfully consumed next time when retried.

B. Preserves the message, the context and the encountered error to some permanent storage, such as a database table, commits the RECEIVE, and continues normally.

C. Ends the conversation with an error.

D. Uses SQL server event notification to ask for manual intervention when queue is disabled.

Option A above is good at handling failures caused by temporary conditions. For instance, a purchase order placed by an user can't be processed if the user profile has not existed yet because, say, user profile request processing that's happening in parallel takes more time than the purchase order; once that processing is complete, the purchase order processing is expected to be through without a problem. Another example could be that the processing transaction is chosen as the victim to rollback in the case of a deadlock; after such a deadlock is detected and removed, the message can then be successfully consumed.

However, there are situations where processing the same message will forever fail. One example may be that the message itself is not self-contained, say a purchase order can't be fulfilled because that credit card holder name and home address given don't match what's on file at the bank. A second example could be that a user profile can't be established because the account name chosen has already taken by an existing user. In these cases, approach B then sounds more suitable for the application to log the message content as well as the context in which it was happening so the situations can be further examined to see if the processing logic can be improved to handle these exception cases normally. Indeed, the two examples mentioned here have actually surfaced deficiencies in the application code because both scenarios can well be successfully processed if the message processing code is revised. For a more vivid example how inappropriate application error handling can lead to poison messages, check out Remus' blog here .

Care must be taken when continuing to the next message without the current one been processed. In a stateful session, skipping messages may cause the conversation into an illegal state and hence invalidates the rest of the dialog. If this is the case, the application then can't afford ignoring one message; thus ending the dialog with error as suggested by option C becomes the right thing to do. As an example, say a live meeting service that is unicasting an ongoing presentation, it had already received slide #2 and was waiting for the content of slide #3. But it then found it couldn't process the slide #3 message because of invalid media format. If it simply ignores them, the slide #3 worth of presentation simply becomes dumb to the receiver. So terminating the session with an error probably is the most appropriate.

Depending on what it is doing, an application can well use a combination of what is illustrated the above. For example, it can allow a failed receive to retry three seconds later. Further, if the retry is still unsuccessful, it then can choose to log the message, commit the RECEIVE transaction and continue to the next one.

As the last resort, application programmers may want not to do anything in their code but fully depend on the default mechanism built in service broker for poison message detection and handling. As formerly stated, the mechanism alone is not good enough. But combining it with SQL server event notification at least can request for administrator's manual intervention when a poison message is detected.

Once a BROKER_QUEUE_DISABLED event notification is defined, a notification message of the following format will be sent by service broker to the specified event notification queue when the service queue is disabled due to poison messages:

<EVENT_INSTANCE>

<EventType>BROKER_QUEUE_DISABLED</EventType>

<ServerName>junan02</ServerName>

</EVENT_INSTANCE>

To identify which service queue this notification is fired for, first use the database id to find out the database name:

SELECT name AS database_name

FROM sys.databases

WHERE database_id = 6

(1 row(s) affected)

SsbTestDBTarget

Then switch to that database, and get the service queue name using the object id:

USE SSBTestDBTarget

SELECT name AS service_queue_name

FROM sys.service_queues

WHERE object_id = 53575229

(1 row(s) affected)

Service1Queue

The SQL script below shows in detail how to create an event notification service to receive notification messages when a user queue being watched gets disabled. A stored procedure is defined to process the posted notification by emailing the administrator about which queue is disabled and ask him/her to jump in to identify and eliminate the problem so normal message processing on the queue can be resumed.

-- create the event notification queue to receive queue_disabled events

CREATE QUEUE [QueueDisabledNotifQueue];

-- create the event notification service for receiving queue_disabled events

CREATE SERVICE [QueueDisabledNotifService]

ON QUEUE [QueueDisabledNotifQueue]

([http://schemas.microsoft.com/SQL/Notifications/PostEventNotification]);

-- create queue-disabled event notification to watch on 'Service1Queue'

CREATE EVENT NOTIFICATION [QueueDisabledNotif]

ON [Service1Queue]

FOR BROKER_QUEUE_DISABLED

TO SERVICE [QueueDisabledNotifService], 'current database'

-- define the stored proc to process queue_disabled notifications