Service Broker & AlwaysOn Availability Groups: Odd Transmission Queue Behavior

I’ve been working on a project over the past several months that will utilize Service Broker and AlwaysOn Availability Groups to meet some of the HA and DR goals of the company I work for (more info here). Just recently, I was able to implement the full solution in my development lab and point an instance of our website at it. While we were working out some kinks in our database and website to get the two working well with my Service Broker Replication project, I began noticing some odd behavior in Service Broker when it’s used with AlwaysOn Availability Groups, and I wanted to blog about it in an attempt to see if anyone else has seen this issue and might have an idea how to address it.

Update: I’ve also posted this question on DBA.StackExchange.com.

The Setup:

I have a Hyper-V host running 6 Windows Server 2008 R2 VMs (BTDevSQLVM1-BTDevSQLVM6). The VMs are grouped into 2-node WSFCs with node and file share quorum. I’ve installed standalone SQL 2012 Developer Edition instances on each of the VMs, and created an Availability Group with a listener on each cluster (SBReplDistrib, SBRepl1, & SBRepl2).

For the purpose of this blog post, I’ll be focusing on the communication between SBRepl1 and SBReplDistrib. The illustration below shows the Service Broker objects for each side of the conversation:

The Service Broker endpoints and routes are setup per this MSDN article.The SBRepl_Receive route in MSDB is for the local server’s service (//SBReplDistrib/SBRepl on SBReplDistrib, and //SBRepl1/SBRepl on SBRepl1), and points to the local instance. The SBRepl_Send route on SBRepl1 maps service //SBReplDistrib/SBRepl to TCP://SBReplDistrib:4022, and the SBRepl_Send_SBRepl1 route on SBReplDistrib is a similar mapping for the service on SBRepl1.

The Expected Behavior:

My understanding of how Service Broker handles message sending and receiving is thus (This is pretty simplified. There is a lot more detail about this process in Klaus Aschenbrenner’s (blog | twitter) book “Pro SQL Server 2008 Service Broker”):

  1. The initiator app creates a message (in this case, well formed XML)
  2. If there is an existing dialog conversation between the initiator service and the target service that is in the conversing status, the app can simply send the message on the existing conversation handle. Otherwise, the initiator app should begin a dialog conversation between the initiator service and the target service and send the message on that conversation handle.
  3. The message is placed in the sys.transmission_queue system table and Service Broker begins making attempts to deliver the message to the target service.
  4. Service Broker looks for an appropriate route and remote service binding and uses them to determine the address to connect to in order to deliver the message.
  5. Service Broker opens a connection to the target, authenticates, and delivers the message to the target service broker.
  6. The target Service Broker attempts to classify the message and determine what local service will handle the message (it uses route data in the msdb database for this).
  7. The target Service Broker delivers the message to the target service’s queue
  8. Once the message is successfully delivered to the target queue, the target Service Broker looks for route information back to the initiator and attempts to deliver an acknowledgement that the message was received.
  9. The initiator’s Service Broker receives the acknowledgement and uses routing information in MSDB to determine what local service the acknowledgement is for.
  10. Upon successful routing of the acknowledgement to the initiating service, the message is then removed from the sys.transmission_queue system table.
  11. If the initiator does not receive an acknowledgement that the message was received, it will periodically retry delivering the message to the target. If the target has already received the message, it will simply drop any additional delivery retries and send acknowledgements for them.

The Odd Behavior:

Step 11 is where I am seeing some very odd behavior with Service Broker and AlwaysOn. I see the message getting delivered to the target and processed successfully, and I also see the acknowledgement getting sent back to the initiator and received. However, the message remains in sys.transmission_queue as though no acknowledgement was received. To make things even more strange, Service Broker isn’t attempting to resend the message like I would expect it to if the acknowledgement wasn’t received. Instead, the message simply remain in the sys.transmission_queue, and as new messages are sent, they get delivered, acknowledged, and they too remain in the sys.transmission_queue. It seems to me like service broker is getting the acknowledgements and therefore stops trying to deliver the message, but doesn’t remove it from the sys.transmission_queue for some reason. The transmission_status for these messages remains blank, which should indicate that Service Broker hasn’t attempted to deliver them yet.

I checked the retention setting on the service queue, and it is set to off, but that should only impact the service queue and not the sys.transmission_queue. I have also traced both sides of the conversation using SQL Profiler, and I am able to see the message getting sent and the acknowledgement being sent back to the initiator and getting received (see XML trace data at the end of this post).

One odd thing did jump out at me in the traces though. I noticed that both sides seemed to be a bit confused about the TCP connections, because messages are sent from the IP address of the node itself while the service routes and the messages themselves point to the name/IP of the AG listener. This confusion appears to be causing each side to close the existing connection between the two services and create a new one in order to deliver a message or acknowledgement. I’m not sure if this is normal or not or if it has anything to do with why the acknowledgements aren’t being handled correctly, but it was the only thing I could see that could possibly explain the odd behavior.

The Plea for Help:

At this time, I don’t have a solution to this message retention issue other than to manually end the conversation with cleanup on both sides, and that’s not really something I want to do. If you have any ideas as to why this might be happening or what I can do about it, please leave me a comment and let me know. If there is any additional information that you would like me to provide about my setup or about the issue, please let me know in the comments as well. I will post a followup to this post if/when I find a solution to this issue.

The Trace Data:

I’ve removed the trace data from this blog post for brevity and security reasons. Please leave me a comment if you would like to review the trace files.

Service Broker Overview

No Junk Mail

No poison messages either, please.

Service Broker has been a core feature of all editions of SQL Server since SQL 2005. It allows for the asynchronous delivery and processing of messages sent within a database or between databases (either on the same instance or in separate instances). In some ways, you can think of Service Broker like the USPS (in fact, SQL Server even uses Service Broker as part of Database Mail to send email messages). This blog post will use the analogy of sending a letter to a friend (yes, a snail-mail letter…yes, I know that nobody does that anymore… :-P) to give a brief overview of the various components of Service Broker and how they work together, allowing you to send a message from one instance to another instance, where the message will be processed.

Encryption & Security

By default, Service Broker will encrypt messages sent between databases (on the same instance, or on separate instances), and the best way to allow this is to use certificates. However, setting up certificate-based encryption can be a major pain in the butt, and order of operations is extremely important. The first thing you want to do is create and backup the database Master Key on each database/instance. Then, you should create a user in each database/instance with or without a login (Ex: create UserA on InstanceA and UserB on InstanceB). Once you create the users, then create a certificate for each user and back it up. Next, create a user on each database/instance that has the same name as the user on the other instance/database (Ex: create UserB on InstanceA and UserA on InstanceB), and then restore the certificate for each of the users you just made (Ex: Restore UserACert on InstanceB and UserBCert on InstanceA). Once you have the users and certificates setup correctly, then you can proceed with setting up the Service Broker infrastructure.

It is recommended to use database users without logins to restrict the users’ access to just the database that is receiving messages, but if you use a Stored Procedure to automatically process Messages received on a Queue, and if that stored procedure needs to interact with any other database(s) than the database that holds the Queue, then you will need to create a user with a login, and map that login to the other database(s) that the Stored Procedure needs to interact with.

It helps to have a central location where you can store the certificate files so that you can get to them later if you need to add more instances to your Service Broker topology.

-- Example script to setup security on AdamServer
USE Demo;
GO;

-- Create the database master key (you should back it up too, but that's not required for this to work)
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'MyReally,Really,Really,ReallyStrongPassword...Really!';

-- You can optionally create a login to use. If you don't need the login, just create the user "WITH NO LOGIN" (instead of "FOR LOGIN AdamUser").
CREATE LOGIN AdamUser WITH PASSWORD = 'SomeOtherReally,Really,ReallyStrongPassword...Really!';

CREATE USER AdamUser FOR LOGIN AdamUser;
EXEC sp_addrolemember @rolename='db_owner', @membername='AdamUser'; -- Add security as you see fit. You probably shouldn't give away the keys to the house like I did here.

-- Create a certificate and bind it to the database user, then back it up (you must do the backup here because you will use it on the remote server to recreate this user and certificate there).
CREATE CERTIFICATE AdamUser_Cert
AUTHORIZATION AdamUser
WITH SUBJECT = 'Certificate for database user AdamUser',
EXPIRY_DATE = '12/31/2099';

BACKUP CERTIFICATE AdamUser_Cert
TO FILE = 'c:\AdamUser_cert.cer' -- It's probably a good idea to use a secure network share here
WITH
PRIVATE KEY (FILE = 'c:\AdamUser_cert.pvk', -- It's probably a good idea to use a secure network share here
ENCRYPTION BY PASSWORD = 'SomeOther,OtherReally,Really,ReallyStrongPassword...Really!');

-- Recreate the remote server's user and certificate on this server (I'm assuming you've already done the steps above on the remote server. The user doesn't need a login.)

CREATE USER FriendUser WITHOUT LOGIN;
EXEC sp_addrolemember @rolename='db_owner', @membername='FriendUser';

CREATE CERTIFICATE FriendUser_Cert
AUTHORIZATION FriendUser
FROM FILE = 'c:\FriendUser_cert.cer' -- It's probably a good idea to use a secure network share here
WITH
PRIVATE KEY (FILE = 'c:\FriendUser_Cert.pvk', -- It's probably a good idea to use a secure network share here
DECRYPTION BY PASSWORD = 'SomeOther,OtherReally,Really,ReallyStrongPassword...Really!');

Message Types & Contracts

In order to hold a conversation using Service Broker, we first have to specify one or more message types. A Message Type is simply a logical way for you to specify what kind of content will be contained in a message. In this analogy, our message type will be called FriendlyLetter (All messages sent with this type will contain letters sent between friends. We could also create another message type called BusinessLetter that would be letters sent between businesses.) and specifies that all messages sent with this Message Type will be validated as WELL_FORMED_XML. Once we have our Message Type(s) created, then we can create a Contract. In this case, our Contract will be called LetterContract, and it specifies that the parties involved in the conversation are allowed to communicate using the FriendlyLetter Message Type only.

CREATE MESSAGE TYPE FriendlyLetter
Authorization dbo
Validation = WELL_FORMED_XML;

CREATE CONTRACT LetterContract
Authorization dbo
(FriendlyLetter SENT BY ANY);

Queues & Services

In order to send and receive our messages, we need a Queue (or mailbox) to send and receive the messages with. When we create our LetterQueue, we can optionally specify an automated process (stored procedure) that will handle the messages received (in this case, dbo.FriendlyMessageHandler). If you specify an automated process to receive the messages, you should also specify how many concurrent copies of that process can be running (in this case, just 1). If you don’t specify an automated process to receive the messages, then you will need to receive them manually (see the Conversations & Messages section).

Just like in real life, our messages won’t get delivered unless there is a Service to deliver them for us (In real life, this would be the USPS. Unlike in real life, the service is only for a single Queue (whereas the USPS delivers to all mailboxes in the US). In this case, we will call our Service the LetterService, and specify that it will be for our LetterQueue, and that it will be owned by the local user we created in the Encryption & Security section.

CREATE QUEUE LetterQueue
WITH
STATUS = ON, -- Is the queue enabled to receive messages?
RETENTION = OFF, -- Does the queue keep messages that have been received off the queue, or delete them?
ACTIVATION ( -- The Activation clause is optional and only needs to be specified when automatic message processing is being used
PROCEDURE_NAME = [dbo].[FriendlyMessageHandler], -- The stored procedure that is run when a message is received
MAX_QUEUE_READERS = 1, -- How many times can we run the proc above concurrently?
EXECUTE AS 'AdamUser', -- What user will the proc be executed as?
STATUS = ON -- Is automatic activation turned on or off for this queue?
);

CREATE SERVICE [//AdamServer/LetterService] -- I recommend naming your services: ///, so that it's easier to tell what server and service you're conversing with
AUTHORIZATION [AdamUser] -- The user that owns this service
ON QUEUE LetterQueue -- What queue is this service for?
([LetterContract]); -- What contract governs conversations over this service?

GRANT SEND ON SERVICE::[//AdamServer/LetterService] to FriendUser; -- Allow the other server's user to send this server messages

Endpoints, Routes, & Remote Service Bindings

Now that we have our Queue and Service created, we need to create a Route and Endpoint so that the Service knows where and how to send messages. We only need one Endpoint per instance, and in this case, our Endpoint will be called SBEndpoint. In order for our Service to know where to send the messages to, we need to specify a Route. In this case, our Route will be called SendLetterRoute, will specify that it is for the receiving server’s LetterService, and will also specify where that service can be found. We also need to create a route in MSDB for receiving messages to our local service, which I’ve called ReceiveLetterRoute.

The last piece of infrastructure we need to create is the Remote Service Binding. This tells SQL Server which user to use when sending messages to the receiving service. Just like with Routes, we need to create one Remote Service Binding for each receiving service we want to send to.

USE MSDB;
GO;

CREATE ENDPOINT SBEndpoint
AUTHORIZATION sa
STATE = STARTED
AS TCP (LISTENER_PORT = 4022)
FOR SERVICE_BROKER (AUTHENTICATION = WINDOWS);

CREATE ROUTE ReceiveLetterRoute
AUTHORIZATION dbo
WITH
SERVICE_NAME = N'//AdamServer/LetterService',
ADDRESS = N'LOCAL';

USE Demo;
GO;

CREATE ROUTE SendLetterRoute
AUTHORIZATION dbo
WITH
SERVICE_NAME = N'//FriendServer/LetterService',
ADDRESS = N'TCP://FriendServer:4022';

CREATE REMOTE SERVICE BINDING [SendLetterBinding]
AUTHORIZATION dbo
TO SERVICE N'//FriendServer/LetterService'
WITH USER = [AdamUser]; -- The user to send the message as

Conversations & Messages

Once all of the infrastructure pieces are created on both the sending and receiving sides, we can begin to hold a conversation/Dialog. When we begin our Dialog, we specify which local service is sending the message and to which remote service it is sending the message. We must also specify the Contract that governs how the two sides may communicate on that Dialog. In this case, our Dialog will be sent from //AdamServer/LetterService to //FriendServer/LetterService on Contract LetterContract. Once the Dialog has been started, we can send messages (of any type allowed by the contract), and the messages will be received in the same order in which they are sent. Since it is possible to send more than one message over a single Dialog, it is recommended not to begin new Dialogs unless absolutely necessary (beginning a new Dialog creates overhead as the two servers authenticate and encrypt a new channel of communication, whereas using an existing Dialog does not incur that overhead).

When the receiving side gets a message, the message is placed in the Queue specified by the receiving Service. If there is an automated process specified to process messages on that queue, then that Stored Procedure will be executed and will process the message. In order to process a message (either automatically or manually), you must Receive the message off of the Queue. In most cases, you will receive one message at a time to make sure that the messages are received and processed in the correct order.

-- Execute on sender:
DECLARE @ConversationHandle uniqueidentifier;
DECLARE @XMLMessage XML = 'Hi!';

BEGIN DIALOG @ConversationHandle
FROM SERVICE [//AdamServer/LetterService]
TO SERVICE N'//FriendServer/LetterService'
ON CONTRACT [LetterContract]
WITH
ENCRYPTION = ON;

SEND ON CONVERSATION @ConversationHandle
MESSAGE TYPE [FriendlyLetter]
(@XMLMessage);
-- Execute on receiver:
declare @ConversationHandle uniqueidentifier, @Service varchar(128), @Contract varchar(128), @MessageType varchar(128), @XMLMessage xml;

WAITFOR (
RECEIVE TOP(1)
@ConversationHandle = conversation_handle,
@Service = service_name,
@Contract = service_contract_name,
@MessageType = message_type_name,
@XMLMessage = CAST(message_body AS XML)
FROM LetterQueue
), TIMEOUT 5000; -- Wait 5 seconds

Bringing It All Together

Service Broker Diagram

Click the image to see it full size in a new window

References