AlwaysOn Availability Groups, Isolation Levels, & Selects Blocking the Redo Thread

This past weekend, I had the pleasure of presenting my session SQL Server 2012 AlwaysOn Readable Secondaries at SQL Saturday #126 in Indianapolis, IN. In that session, I covered the basics of AlwaysOn Availability Groups, how to set them up, failing over, and using readable secondary replicas for read-only workloads and backups. During the session, I always make sure to mention the possibility of queries running on the secondary replica blocking the redo thread that is trying to write changes from the primary database to the secondary database.

After mentioning this caveat, I was asked a very good question: Can you use transaction isolation levels (such as snapshot isolation) with AlwaysOn Availability Groups, and would they help to avoid the issue of read-only queries on secondary replicas blocking the redo thread? In order to answer this question, I’m going to break it two parts, and then we’ll work through a couple demos to illustrate the answers.

Questions:

Q: Can you use transaction isolation levels with AlwaysOn Availability Groups?
A:
Yes and no. Yes, you can set whatever isolation level you would like when running queries on the secondary replicas, and SQL Server will not return an error. However, SQL Server will automatically override the isolation level (and ignore all lock hints) when querying a read-only replica, and instead force the queries to use snapshot isolation.

Per the “Benefits” topic of the Active Secondaries: Readable Secondary Replicas (AlwaysOn Availability Groups) BOL article:

Read-only workloads use row versioning to remove blocking contention on the secondary databases. All queries that run against the secondary databases are automatically mapped to snapshot isolation transaction level, even when other transaction isolation levels are explicitly set. Also, all locking hints are ignored. This eliminates reader/writer contention.

Q: Does using snapshot isolation prevent read-only queries from blocking the redo thread on secondary replicas?
A: After reading the above quote from BOL, you would think that read-only queries won’t block the redo thread, and that is true for DML statements. However, it is not true for DDL statements. Sure, snapshot isolation prevents read-only queries on the secondary replicas from taking locks and preventing other DML statements against the database from executing, but it doesn’t prevent the read-only queries from taking schema stability locks and blocking DDL statements.

If you take a look later on in the same article I quoted above, you will come to the “Read-Only Workload Impact” section under the “Performance Considerations” topic:

Also, read-only workloads on the secondary replicas can block data definition language (DDL) changes that are applied through log records. Even though the read operations do not take shared locks because of row versioning, these operations take schema stability (Sch-S) locks, which can block redo operations that are applying DDL changes.

Demos:

In order to have a reference architecture to use for this demo, I’m going to use the Availability Group that I setup as part of the presentation that I mentioned at the beginning of this article. If you would like to play along at home, you can download the scripts and slide deck for that session here. This Availability Group is called DemoAG, and has three replicas, AlwaysOn1 (primary), AlwaysOn2 (secondary), and AlwaysOn3 (secondary). There are two databases participating in the Availability Group: AdventureWorks2012 and AdventureWorksDW2012. I’m going to be using AdventureWorks2012 for this demo.

Testing isolation levels and lock hints:

Unfortunately, I haven’t come up with a good way to demonstrate that SQL is overriding the transaction isolation level on the readable secondary because it is actually doing this internally (I’ll keep looking into this to see if there is a good way to demonstrate it). I can, however, demonstrate that SQL Server is ignoring lock hints on the read-only replicas. I can do this by beginning a transaction, executing a select query with the tablockx lock hint, and leaving the transaction open. I then open a new query window and query sys.dm_tran_locks to see if there is a table lock on the table. If I do this on the primary, you will see an open table lock on the table. If I do this on the secondary, there is no open table lock. If I then try to update data on the primary replica, I am able to see the statement completes and that the data has been written to the replica, even with the open transaction on the secondary that should be holding a table lock.

-- Execute this in a session of its own on the primary replica:
begin tran;

select * from ErrorLog with (tablockx);

-- With the above transaction open, launch a new query window on
-- the primary replica, and execute this query
-- (you should notice a table lock on the ErrorLog table):
use master;

select
t.name
, dtr.*
from sys.dm_tran_locks dtr
join sys.databases d
on d.name = 'AdventureWorks2012'
and d.database_id = dtr.resource_database_id
join AdventureWorks2012.sys.tables t
on t.object_id = dtr.resource_associated_entity_id;

-- Now try running this insert in the same query window as the sys.dm_tran_locks query
-- (you should get blocked until you commit or rollback the transaction in the other session):
insert AdventureWorks2012.dbo.ErrorLog(UserName, ErrorNumber, ErrorMessage)
values('DemoUser', 1, 'Demo Message');

-- Go ahead and close both query windows, and connect to one of the secondary replicas.
-- Now, try running through the scenario above on the secondary replica (minus the insert).
-- You should notice that there is no table lock being held, even though we explicitly requested one.

Testing read-only queries blocking DDL statements:

As far as demonstrating that read-only queries can block DDL statements, all we need to do is run a long-running query on the secondary replica and simultaneously try to alter the table that the query on the secondary is using. We should notice that the redo queue size on the secondary then increases and stays that way until we either kill or complete the read-only query. This indicates that the redo thread is not able to commit the DDL changes because the select statement is taking a schema stability lock and preventing the alteration to the table.

-- Run this query on the secondary replica
-- (it's horribly ugly and inefficient, but the idea is to hold a
-- schema stability lock on ErrorLog for as long as we can):

select
NumbersTable.Number
from AdventureWorks2012.dbo.ErrorLog el
cross apply (
select
Number
from
(
select
row_number() over (order by s1.name) as number
from AdventureWorks2012.sys.sysobjects s1
cross apply AdventureWorks2012.sys.sysobjects s2
cross apply AdventureWorks2012.sys.sysobjects s3
cross apply AdventureWorks2012.sys.sysobjects s4
cross apply AdventureWorks2012.sys.sysobjects s5
) as InnerNumbersTable
) NumbersTable
group by NumbersTable.Number
order by NumbersTable.Number desc;

-- While that beast is running on the secondary replica, run the
-- following on the primary replica:

alter table AdventureWorks2012.dbo.ErrorLog add DemoColumn varchar(10) null;

-- Now go back to the secondary replica, open a new query window, and run the
-- query below (you should notice that the redo queue is greater than 0,
-- which indicates that the redo thread is having trouble applying changes
-- to the database):
select
dhdrs.redo_queue_size
from master.sys.dm_hadr_database_replica_states dhdrs
join master.sys.databases d
on d.database_id = dhdrs.database_id
and d.name = 'AdventureWorks2012';