Introduction to SQL Server 2012 AlwaysOn Availability Groups – Q & A

Last week, I presented my session “Introduction to SQL Server 2012 AlwaysOn Availability Groups” to my largest audience ever at the PASS DBA Fundamentals Virtual Chapter. There were 191 total attendees, and I would like to take a moment to thank all of you for attending, it was truly AWESOME! Also, I would like to take a moment to apologize for the audio issues that occurred throughout the session. This was primarily my fault, as I had joined the webinar twice, once as a presenter by phone with a high-quality headset and good quality audio connection, and another time as an attendee just to keep an eye on what all of you were seeing. Unfortunately, the attendee laptop was somehow selected as the microphone to be used while I presented from my actual presenter laptop, and that is why the audio kept fading in and out and was poor quality. Mark, Mike, and I met to discuss this and how to prevent it in the future, and so this should not happen again.

Anyway, I received several questions during this session that I wanted to address via this blog post, as I think they could benefit everyone. So without further delay, here they are:

  • What would you recommend for the maximum number of (practical) databases per Availability Group?
    • This will depend on the hardware you’re running on (specifically the number of CPU threads that the primary replica can support), and the network bandwidth available between your primary and secondary replicas. Also, the amount of transactions per second occurring in each database will be a factor in this. There are no hard-and-fast rules about how many databases can be in the Availability Group. Please see http://msdn.microsoft.com/en-us/library/ff878487.aspx#RestrictionsAG for Microsoft’s recommendations in this regard.
  • How do Availability Groups work with CDC?
  • If an Availability Group is setup at the SQL Instance level, can you have multiple SQL instances per cluster node and have an Active-Active configuration?
    • First of all, an Availability Groups is not the same as a Failover Cluster Instance. An Availability Group is a group of 1 or more databases that all failover together and share a set of replicas onto which that failover may occur. Each replica is another SQL Server instance that sits on another node of the same Windows Server Failover Cluster that the primary replica does. With that said, an Availability Group can only have replicas that are nodes of the same Windows Server Failover Cluster. Therefore, active/active in an Availability Group would be more a question about which replicas are readable or not and not so much about running multiple Availability Groups. Additionally, an Availability Group is not an instance-level failover (like in a Failover Cluster Instance), so things like the master and MSDB databases, logins, etc. do not failover in an Availability Group. You can have multiple Availability Groups running at the same time, but keep in mind that they would all need to be sitting on nodes of the same Windows Server Failover Cluster, and only one instance of SQL Server per cluster node can participate in Availability Groups due to the coordination between the Availability Groups and their underlying Windows Server Failover Cluster. To clarify that a bit, you cannot install 2 SQL Server instances to the same Windows Server Failover Cluster node and have one instance host a replica for on Availability Group and the other instance host a replica for a different Availability Group. Instead, you would have a single SQL Server Instance on the Windows Server Failover Cluster node that would participate in both of the Availability Groups.
  • Is it possible to set this up with demo licenses? Is there a temp/demo/developer clustering license available from Microsoft? (for those of us on the bench who would like to test this)
    • Absolutely! Microsoft offers an evaluation version of SQL Server 2012, which can be downloaded from http://www.microsoft.com/en-us/download/details.aspx?id=29066 and used to test AlwaysOn Availability Groups. In addition, if you already have SQL Server Developer Edition licenses, you can use those licenses to test AlwaysOn Availability Groups (you just can’t use them in any production capacity).
  • Can you select which databases are in the Availability Group? Can you have two different databases from two different servers?
    • Yes, you can select which databases are part of the Availability Group. However, any database that is part of the Availability Group will need to be present on the primary replica and then synchronized to all secondary replicas. Therefore, if your primary replica has 10 databases, you could select 5 of those databases to be part of the Availability Group and those would then be synchronized to the other replicas. The 5 databases not included in the Availability Group would remain untouched and only on the server that they were on originally. The same is true of the secondary replicas. They will already contain all of the databases that are part of the Availability Group, but they can also contain a number of local databases that are not part of the Availability Group.
  • Can we use SQL Server Standard edition for only two nodes? (reference: http://msdn.microsoft.com/en-us/library/cc645993.aspx)
    • No, you cannot. What the High Availability matrix is showing is running a two node Failover Cluster Instance on Standard edition. There are only two editions of SQL Server that will support Availability Groups, and those are Enterprise and Developer editions, and both edition support up to 5 replicas in an Availability Group. Remember that AlwaysOn is just a marketing term that Microsoft uses to describe several of their High Availability features, and is not a feature in itself. Don’t let their overuse of this term confuse you.
  • Should the listener be a separate server? Does the listener need to have SQL Server installed on it?
    • The listener name is just a cluster resource name, and is not a separate physical server or cluster node, nor is it a separate SQL Server instance like a Database Mirroring Witness would be. Think of the listener name as just another service that the Windows Server Failover Cluster can host on any of its nodes. The caveat here is that the Availability Group and the Cluster are talking to one another and so the Availability Group makes sure that the listener name is always hosted by the cluster node that is the primary replica of the Availability Group. Therefore it is safe to say that the primary replica (a stand-alone SQL Server Instance installed on a Windows Server Failover Cluster node) is always the host of the listener name (if you created one).

AlwaysOn Availability Groups, Isolation Levels, & Selects Blocking the Redo Thread

This past weekend, I had the pleasure of presenting my session SQL Server 2012 AlwaysOn Readable Secondaries at SQL Saturday #126 in Indianapolis, IN. In that session, I covered the basics of AlwaysOn Availability Groups, how to set them up, failing over, and using readable secondary replicas for read-only workloads and backups. During the session, I always make sure to mention the possibility of queries running on the secondary replica blocking the redo thread that is trying to write changes from the primary database to the secondary database.

After mentioning this caveat, I was asked a very good question: Can you use transaction isolation levels (such as snapshot isolation) with AlwaysOn Availability Groups, and would they help to avoid the issue of read-only queries on secondary replicas blocking the redo thread? In order to answer this question, I’m going to break it two parts, and then we’ll work through a couple demos to illustrate the answers.

Questions:

Q: Can you use transaction isolation levels with AlwaysOn Availability Groups?
A:
Yes and no. Yes, you can set whatever isolation level you would like when running queries on the secondary replicas, and SQL Server will not return an error. However, SQL Server will automatically override the isolation level (and ignore all lock hints) when querying a read-only replica, and instead force the queries to use snapshot isolation.

Per the “Benefits” topic of the Active Secondaries: Readable Secondary Replicas (AlwaysOn Availability Groups) BOL article:

Read-only workloads use row versioning to remove blocking contention on the secondary databases. All queries that run against the secondary databases are automatically mapped to snapshot isolation transaction level, even when other transaction isolation levels are explicitly set. Also, all locking hints are ignored. This eliminates reader/writer contention.

Q: Does using snapshot isolation prevent read-only queries from blocking the redo thread on secondary replicas?
A: After reading the above quote from BOL, you would think that read-only queries won’t block the redo thread, and that is true for DML statements. However, it is not true for DDL statements. Sure, snapshot isolation prevents read-only queries on the secondary replicas from taking locks and preventing other DML statements against the database from executing, but it doesn’t prevent the read-only queries from taking schema stability locks and blocking DDL statements.

If you take a look later on in the same article I quoted above, you will come to the “Read-Only Workload Impact” section under the “Performance Considerations” topic:

Also, read-only workloads on the secondary replicas can block data definition language (DDL) changes that are applied through log records. Even though the read operations do not take shared locks because of row versioning, these operations take schema stability (Sch-S) locks, which can block redo operations that are applying DDL changes.

Demos:

In order to have a reference architecture to use for this demo, I’m going to use the Availability Group that I setup as part of the presentation that I mentioned at the beginning of this article. If you would like to play along at home, you can download the scripts and slide deck for that session here. This Availability Group is called DemoAG, and has three replicas, AlwaysOn1 (primary), AlwaysOn2 (secondary), and AlwaysOn3 (secondary). There are two databases participating in the Availability Group: AdventureWorks2012 and AdventureWorksDW2012. I’m going to be using AdventureWorks2012 for this demo.

Testing isolation levels and lock hints:

Unfortunately, I haven’t come up with a good way to demonstrate that SQL is overriding the transaction isolation level on the readable secondary because it is actually doing this internally (I’ll keep looking into this to see if there is a good way to demonstrate it). I can, however, demonstrate that SQL Server is ignoring lock hints on the read-only replicas. I can do this by beginning a transaction, executing a select query with the tablockx lock hint, and leaving the transaction open. I then open a new query window and query sys.dm_tran_locks to see if there is a table lock on the table. If I do this on the primary, you will see an open table lock on the table. If I do this on the secondary, there is no open table lock. If I then try to update data on the primary replica, I am able to see the statement completes and that the data has been written to the replica, even with the open transaction on the secondary that should be holding a table lock.

-- Execute this in a session of its own on the primary replica:
begin tran;

select * from ErrorLog with (tablockx);

-- With the above transaction open, launch a new query window on
-- the primary replica, and execute this query
-- (you should notice a table lock on the ErrorLog table):
use master;

select
t.name
, dtr.*
from sys.dm_tran_locks dtr
join sys.databases d
on d.name = 'AdventureWorks2012'
and d.database_id = dtr.resource_database_id
join AdventureWorks2012.sys.tables t
on t.object_id = dtr.resource_associated_entity_id;

-- Now try running this insert in the same query window as the sys.dm_tran_locks query
-- (you should get blocked until you commit or rollback the transaction in the other session):
insert AdventureWorks2012.dbo.ErrorLog(UserName, ErrorNumber, ErrorMessage)
values('DemoUser', 1, 'Demo Message');

-- Go ahead and close both query windows, and connect to one of the secondary replicas.
-- Now, try running through the scenario above on the secondary replica (minus the insert).
-- You should notice that there is no table lock being held, even though we explicitly requested one.

Testing read-only queries blocking DDL statements:

As far as demonstrating that read-only queries can block DDL statements, all we need to do is run a long-running query on the secondary replica and simultaneously try to alter the table that the query on the secondary is using. We should notice that the redo queue size on the secondary then increases and stays that way until we either kill or complete the read-only query. This indicates that the redo thread is not able to commit the DDL changes because the select statement is taking a schema stability lock and preventing the alteration to the table.

-- Run this query on the secondary replica
-- (it's horribly ugly and inefficient, but the idea is to hold a
-- schema stability lock on ErrorLog for as long as we can):

select
NumbersTable.Number
from AdventureWorks2012.dbo.ErrorLog el
cross apply (
select
Number
from
(
select
row_number() over (order by s1.name) as number
from AdventureWorks2012.sys.sysobjects s1
cross apply AdventureWorks2012.sys.sysobjects s2
cross apply AdventureWorks2012.sys.sysobjects s3
cross apply AdventureWorks2012.sys.sysobjects s4
cross apply AdventureWorks2012.sys.sysobjects s5
) as InnerNumbersTable
) NumbersTable
group by NumbersTable.Number
order by NumbersTable.Number desc;

-- While that beast is running on the secondary replica, run the
-- following on the primary replica:

alter table AdventureWorks2012.dbo.ErrorLog add DemoColumn varchar(10) null;

-- Now go back to the secondary replica, open a new query window, and run the
-- query below (you should notice that the redo queue is greater than 0,
-- which indicates that the redo thread is having trouble applying changes
-- to the database):
select
dhdrs.redo_queue_size
from master.sys.dm_hadr_database_replica_states dhdrs
join master.sys.databases d
on d.database_id = dhdrs.database_id
and d.name = 'AdventureWorks2012';