Answer Guruji(8920128728)

Sunday, September 29, 2019

Always On Failover Cluster Instances (SQL Server)

As part of the SQL Server Always On offering, Always On Failover Cluster Instances leverages Windows Server Failover Clustering (WSFC) functionality to provide local high availability through redundancy at the server-instance level-a failover cluster instance (FCI).

An FCI is a single instance of SQL Server that is installed across Windows Server Failover Clustering (WSFC) nodes and, possibly, across multiple subnets. On the network, an FCI appears to be an instance of SQL Server running on a single computer, but the FCI provides failover from one WSFC node to another if the current node becomes unavailable.

Note

Windows Server 2016 Datacenter edition introduces support for Storage Spaces Direct (S2D). SQL Server Failover Cluster Instances support S2D for cluster storage resources. For more information, see Storage Spaces Direct in Windows Server 2016.

Failover Cluster Instances also support Clustered Shared Volumes (CSV).

In this Topic:

Benefits
Recommendations
Failover Cluster Instance Overview
Elements of a Failover Cluster Instance
SQL Server Failover Concepts and Tasks

Benefits of a Failover Cluster Instance

When there is hardware or software failure of a server, the applications or clients connecting to the server will experience downtime.

When a SQL Server instance is configured to be an FCI (instead of a standalone instance), the high availability of that SQL Server instance is protected by the presence of redundant nodes in the FCI.

Only one of the nodes in the FCI owns the WSFC resource group at a time. In case of a failure (hardware failures, operating system failures, application or service failures), or a planned upgrade, the resource group ownership is moved to another WSFC node.

This process is transparent to the client or application connecting to SQL Server and this minimize the downtime the application or clients experience during a failure. The following lists some key benefits that SQL Server failover cluster instances provide:

Protection at the instance level through redundancy
Automatic failover in the event of a failure (hardware failures, operating system failures, application or service failures)

Important

In an availability group, automatic failover from an FCI to other nodes within the availability group is not supported. This means that FCIs and standalone nodes should not be coupled together within an availability group if automatic failover is an important component your high availability solution. However, this coupling can be made for your disaster recovery solution.
Support for a broad array of storage solutions, including WSFC cluster disks (iSCSI, Fiber Channel, and so on) and server message block (SMB) file shares.
Disaster recovery solution using a multi-subnet FCI or running an FCI-hosted database inside an availability group. With the new multi-subnet support in MicrosoftSQL Server 2012 (11.x), a multi-subnet FCI no longer requires a virtual LAN, increasing the manageability and security of a multi-subnet FCI.
Zero reconfiguration of applications and clients during failovers
Flexible failover policy for granular trigger events for automatic failovers
Reliable failovers through periodic and detailed health detection using dedicated and persisted connections
Configurability and predictability in failover time through indirect background checkpoints
Throttled resource usage during failovers

Recommendations

In a production environment, we recommend that you use static IP addresses in conjunction the virtual IP address of a Failover Cluster Instance.

We recommend against using DHCP in a production environment. In the event of down time, if the DHCP IP lease expires, extra time is required to re-register the new DHCP IP address associated with the DNS name.

Failover Cluster Instance Overview

An FCI runs in a WSFC resource group with one or more WSFC nodes. When the FCI starts up, one of the nodes assume ownership of the resource group and brings its SQL Server instance online. The resources owned by this node include:

Network name
IP address
Shared disks
SQL Server Database Engine service
SQL Server Agent service
SQL Server Analysis Services service, if installed
One file share resource, if the FILESTREAM feature is installed

At any time, only the resource group owner (and no other node in the FCI) is running its respective SQL Server services in the resource group. When a failover occurs, whether it be an automatic failover or a planned failover, the following sequence of events happen:

Unless a hardware or system failure occurs, all dirty pages in the buffer cache are written to disk.
All respective SQL Server services in the resource group are stopped on the active node.
The resource group ownership is transferred to another node in the FCI.
The new resource group owner starts its SQL Server services.
Client application connection requests are automatically directed to the new active node using the same virtual network name (VNN).

The FCI is online as long as its underlying WSFC cluster is in good quorum health (the majority of the quorum WSFC nodes are available as automatic failover targets).

When the WSFC cluster loses its quorum, whether due to hardware, software, network failure, or improper quorum configuration, the entire WSFC cluster, along with the FCI, is brought offline.

Manual intervention is then required in this unplanned failover scenario to reestablish quorum in the remaining available nodes in order to bring the WSFC cluster and FCI back online.

Predictable Failover Time

Depending on when your SQL Server instance last performed a checkpoint operation, there can be a substantial amount of dirty pages in the buffer cache. Consequently, failovers last as long as it takes to write the remaining dirty pages to disk, which can lead to long and unpredictable failover time.

Beginning with MicrosoftSQL Server 2012 (11.x), the FCI can use indirect checkpoints to throttle the amount of dirty pages kept in the buffer cache. While this does consume additional resources under regular workload, it makes the failover time more predictable as well as more configurable.

This is very useful when the service-level agreement in your organization specifies the recovery time objective (RTO) for your high availability solution.

Reliable Health Monitoring and Flexible Failover Policy

After the FCI starts successfully, the WSFC service monitors both the health of the underlying WSFC cluster, as well as the health of the SQL Server instance.

Beginning with MicrosoftSQL Server 2012 (11.x), the WSFC service uses a dedicated connection to poll the active SQL Server instance for detailed component diagnostics through a system stored procedure. The implication of this is three-fold:

The dedicated connection to the SQL Server instance makes it possible to reliably poll for component diagnostics all the time, even when the FCI is under heavy load. This makes it possible to distinguish between a system that is under heavy load and a system that actually has failure conditions, thus preventing issues such as false failovers.
The detailed component diagnostics makes it possible to configure a more flexible failover policy, whereby you can choose what failure conditions trigger failovers and which failure conditions do not.
The detailed component diagnostics also enables better troubleshooting of automatic failovers retroactively. The diagnostic information is stored to log files, which are collocated with the SQL Server error logs. You can load them into the Log File Viewer to inspect the component states leading up to the failover occurrence in order to determine what cause that failover.

Elements of a Failover Cluster Instance

An FCI consists of a set of physical servers (nodes) that contain similar hardware configuration as well as identical software configuration that includes operating system version and patch level, and SQL Server version, patch level, components, and instance name. Identical software configuration is necessary to ensure that the FCI can be fully functional as it fails over between the nodes.

WSFC Resource Group
A SQL Server FCI runs in a WSFC resource group. Each node in the resource group maintains a synchronized copy of the configuration settings and check-pointed registry keys to ensure full functionality of the FCI after a failover, and only one of the nodes in the cluster owns the resource group at a time (the active node).

The WSFC service manages the server cluster, quorum configuration, failover policy, and failover operations, as well as the VNN and virtual IP addresses for the FCI.

In case of a failure (hardware failures, operating system failures, application or service failures) or a planned upgrade, the resource group ownership is moved to another node in the FCI.The number of nodes that are supported in a WSFC resource group depends on your SQL Server edition.

Also, the same WSFC cluster can run multiple FCIs (multiple resource groups), depending on your hardware capacity, such as CPUs, memory, and number of disks.

SQL Server Binaries
The product binaries are installed locally on each node of the FCI, a process similar to SQL Server stand-alone installations. However, during startup, the services are not started automatically, but managed by WSFC.

Storage
Contrary to the availability group, an FCI must use shared storage between all nodes of the FCI for database and log storage. The shared storage can be in the form of WSFC cluster disks, disks on a SAN, Storage Spaces Direct (S2D), or file shares on an SMB.

This way, all nodes in the FCI have the same view of instance data whenever a failover occurs. This does mean, however, that the shared storage has the potential of being the single point of failure, and FCI depends on the underlying storage solution to ensure data protection.

Network Name
The VNN for the FCI provides a unified connection point for the FCI. This allows applications to connect to the VNN without the need to know the currently active node. When a failover occurs, the VNN is registered to the new active node after it starts. This process is transparent to the client or application connecting to SQL Server and this minimize the downtime the application or clients experience during a failure.

Virtual IPs
In the case of a multi-subnet FCI, a virtual IP address is assigned to each subnet in the FCI. During a failover, the VNN on the DNS server is updated to point to the virtual IP address for the respective subnet. Applications and clients can then connect to the FCI using the same VNN after a multi-subnet failover.

SQL Server Failover Concepts and Tasks

Concepts and Tasks	Topic
Describes the failure detection mechanism and the flexible failover policy.	Failover Policy for Failover Cluster Instances
Describes concepts in FCI administration and maintenance.	Failover Cluster Instance Administration and Maintenance
Describes multi-subnet configuration and concepts	SQL Server Multi-Subnet Clustering (SQL Server)

Windows Server Failover Clustering with SQL Server

A Windows Server Failover Cluster (WSFC) is a group of independent servers that work together to increase the availability of applications and services. SQL Server 2017 takes advantage of WSFC services and capabilities to support Always On availability groups and SQL Server Failover Cluster Instances.

Terms and Definitions

Windows Server Failover Cluster (WSFC) A WSFC is a group of independent servers that work together to increase the availability of applications and services.

Node
A server that is participating in a WSFC.

Cluster resource
A physical or logical entity that can be owned by a node, brought online and taken offline, moved between nodes, and managed as a cluster object. A cluster resource can be owned by only a single node at any point in time.

Role
A collection of cluster resources managed as a single cluster object to provide specific functionality. For SQL Server, a role will be either an Always On Availability Group (AG) or Always On Failover Cluster Instance (FCI). A role contains all of the cluster resources that are required for an AG or FCI. Failover and failback always act in context of roles. For an FCI, the role will contain an IP address resource, a network name resource, and the SQL Server resources. An AG role will contain the AG resource, and if a listener is configured, a network name and an IP resource.

Network name resource
A logical server name that is managed as a cluster resource. A network name resource must be used with an IP address resource. These entries may require objects in Active Directory Domain Services and/or DNS.

Resource dependency
A resource on which another resource depends. If resource A depends on resource B, then B is a dependency of A. Resource A will not be able to start without resource B.

Preferred owner
A node on which a resource group prefers to run. Each resource group is associated with a list of preferred owners sorted in order of preference. During automatic failover, the resource group is moved to the next preferred node in the preferred owner list.

Possible owner
A secondary node on which a resource can run. Each resource group is associated with a list of possible owners. Roles can fail over only to nodes that are listed as possible owners.

Quorum mode
The quorum configuration in a failover cluster that determines the number of node failures that the cluster can sustain.

Force quorum
The process to start the cluster even though only a minority of the elements that are required for quorum are in communication.

Overview of Windows Server Failover Clustering

Windows Server Failover Clustering provides infrastructure features that support the high-availability and disaster recovery scenarios of hosted server applications such as Microsoft SQL Server and Microsoft Exchange. If a cluster node or service fails, the services that were hosted on that node can be automatically or manually transferred to another available node in a process known as failover.

The nodes in a WSFC work together to collectively provide these types of capabilities:

Distributed metadata and notifications. WSFC service and hosted application metadata is maintained on each node in the cluster. This metadata includes WSFC configuration and status in addition to hosted application settings. Changes to a node's metadata or status are automatically propagated to the other nodes in the WSFC.
Resource management. Individual nodes in the WSFC may provide physical resources such as direct-attached storage, network interfaces, and access to shared disk storage. Hosted applications register themselves as a cluster resource, and may configure startup and health dependencies upon other resources.
Health monitoring. Inter-node and primary node health detection is accomplished through a combination of heartbeat-style network communications and resource monitoring. The overall health of the WSFC is determined by the votes of a quorum of nodes in the WSFC.
Failover coordination. Each resource is configured to be hosted on a primary node, and each can be automatically or manually transferred to one or more secondary nodes. A health-based failover policy controls automatic transfer of resource ownership between nodes. Nodes and hosted applications are notified when failover occurs so that they may react appropriately.

SQL Server Always On Technologies and WSFC

SQL Server 2017 Always On is a high availability and disaster recovery solution that takes advantage of WSFC. The Always On features provide integrated, flexible solutions that increase application availability, provide better returns on hardware investments, and simplify high availability deployment and management.

Both Always On availability groups and Always On Failover Cluster Instances use WSFC as a platform technology, registering components as WSFC cluster resources. Related resources are combined into a role, which can be made dependent upon other WSFC cluster resources. The WSFC can then sense and signal the need to restart the SQL Server instance or automatically fail it over to a different server node in the WSFC.

IMPORTANT!! To take full advantage of SQL Server Always On technologies, you should apply several WSFC-related prerequisites.

Instance-level High Availability with Always On Failover Cluster Instances

An Always On Failover Cluster Instance (FCI) is a SQL Server instance that is installed across nodes in a WSFC. This type of instance depends on resources for storage and virtual network name. The storage can use Fibre Channel, iSCSI, FCoE, or SAS for shared disk storage, or use locally attached storage with Storage Spaces Direct (S2D). The virtual network name resource depends on one or more virtual IP addresses, each in a different subnet. The SQL Server service and the SQL Server Agent service are also resources, and both are dependent upon the storage and virtual network name resources.

In the event of a failover, the WSFC service transfers ownership of instance's resources to a designated failover node. The SQL Server instance is then re-started on the failover node, and databases are recovered as usual. At any given moment, only a single node in the cluster can host the FCI and underlying resources.

NOTE: An Always On Failover Cluster Instance requires symmetrical shared disk storage such as a storage area network (SAN) or SMB file share. The shared disk storage volumes must be available to all potential failover nodes in the WSFC cluster.

Database-level High Availability with Always On availability groups

An Always On Availability Group (AG) is a one or more user databases that fail over together. An availability group consists of a primary availability replica and one to four secondary replicas that are maintained through SQL Server log-based data movement for data protection without the need for shared storage. Each replica is hosted by an instance of SQL Server on a different node of the WSFC. The availability group and a corresponding virtual network name are registered as resources in the WSFC cluster.

An availability group listener on the primary replica's node responds to incoming client requests to connect to the virtual network name, and based on attributes in the connection string, it redirects each request to the appropriate SQL Server instance.

In the event of a failover, instead of transferring ownership of shared physical resources to another node, WSFC is leveraged to reconfigure a secondary replica on another SQL Server instance to become the availability group's primary replica. The availability group's virtual network name resource is then transferred to that instance.

At any given moment, only a single SQL Server instance may host the primary replica of an availability group's databases, all associated secondary replicas must each reside on a separate instance, and each instance must reside on separate physical nodes.

NOTE: Always On availability groups do not require deployment of a Failover Cluster Instance or use of symmetric shared storage (SAN or SMB).

A Failover Cluster Instance (FCI) may be used together with an availability group to enhance the availability of an availability replica. However, to prevent potential race conditions in the WSFC cluster, automatic failover of the availability group is not supported to or from an availability replica that is hosted on a FCI.

WSFC Health Monitoring and Failover

High availability for an Always On solution is accomplished though proactive health monitoring of physical and logical WSFC cluster resources, together with automatic failover onto and re-configuration of redundant hardware. A system administrator can also initiate a manual failover of an availability group or SQL Server instance from one node to another.

Failover Policies for Nodes, Failover Cluster Instances, and Availability Groups

A failover policy is configured at the WSFC node, the SQL Server Failover Cluster Instance (FCI), and the availability group levels. These policies, based on the severity, duration, and frequency of unhealthy cluster resource status and node responsiveness, can trigger a service restart or an automatic failover of cluster resources from one node to another, or can trigger the move of an availability group primary replica from one SQL Server instance to another.

Failover of an availability group replica does not affect the underlying SQL Server instance. Failover of a FCI moves the hosted availability group replicas with the instance.

WSFC Resource Health Detection

Each resource in a WSFC can report its status and health, periodically or on-demand. A variety of circumstances may indicate resource failure; e.g. power failure, disk or memory errors, network communication errors, or non-responsive services.

WSFC resources such as networks, storage, or services can be made dependent upon one another. The cumulative health of a resource is determined by successively rolling up its health with the health of each of its resource dependencies.

WSFC Inter-node Health Detection and Quorum Voting

Each node in a WSFC participates in periodic heartbeat communication to share the node's health status with the other nodes. Unresponsive nodes are considered to be in a failed state.

Quorum is a mechanism that helps ensure that the WSFC is up and running through ensuring enough resources are online in the WSFC. If the WSFC has enough votes, it is healthy and able to provide node-level fault tolerance.

A quorum mode is configured in the WSFC that dictates the methodology used for quorum voting and when to perform an automatic failover or take the cluster offline.

TIP!! It is best practice to always have an odd number of quorum votes in a WSFC. For the purposes of quorum voting, SQL Server does not have to be installed on all nodes in the cluster. An additional server can act as a quorum member, or the WSFC quorum model can be configured to use a remote file share as a tie-breaker.

Disaster Recovery Through Forcing Quorum

Depending upon operational practices and WSFC configuration, you can incur both automatic and manual failovers, and still maintain a robust, fault-tolerant SQL Server Always On solution. However, if a quorum of the eligible voting nodes in the WSFC cannot communicate with one another, or if the WSFC cluster otherwise fails health validation, then the WSFC may go offline.

If the WSFC goes offline because of an unplanned disaster, or due to a persistent hardware or communications failure, then manual administrative intervention is required to force quorum and bring the surviving cluster nodes back online in a non-fault-tolerant configuration.

Afterwards, a series of steps must also be taken to reconfigure the WSFC, recover the affected database replicas, and to re-establish a new quorum.

Relationship of SQL Server AlwaysOn Components to WSFC

Several layers of relationships exist between SQL Server Always On and WSFC features and components.

Always On availability groups are hosted on SQL Server instances.
A client request that specifies a logical availability group listener network name to connect to a primary or secondary database is redirected to the appropriate instance network name of the underlying SQL Server instance or SQL Server FCI.

SQL Server instances are actively hosted on a single node.
If present, a stand-alone SQL Server Instance always resides on a single Node with a static instance network name. If present, a SQL Server FCI is active on one of two or more possible failover nodes with a single virtual Instance Network Name.

Nodes are members of a WSFC cluster.
WSFC configuration metadata and status for all nodes is stored on each node. Each server may provide asymmetric storage or shared storage (SAN) volumes for user or system databases. Each server has at least one physical network interface on one or more IP subnets.

The WSFC monitors health and manages configuration for a group of servers.
The WSFC mechanisms propagate changes to WSFC configuration metadata and status to all nodes in the WSFC. If a disk witness is used, the metadata is also stored there. By default, each node of the WSFC gets a vote towards quorum and a witness will be used if necessary and is configured.

Always On availability groups registry keys are subkeys of the WSFC cluster.
If you delete and re-create a WSFC, you must disable and re-enable the Always On availability groups feature on each server instance that was enabled for Always On availability groups on the original WSFC.

SQL Server AlwaysOn Component Context Diagram

Prerequisites, Restrictions, and Recommendations for Always On availability groups

This article describes considerations for deploying Always On availability groups, including prerequisites, restrictions, and recommendations for host computers, Windows Server failover clusters (WSFC), server instances, and availability groups. For each of these components security considerations and required permissions, if any, are indicated.

Important

Before you deploy Always On availability groups, we strongly recommend that you read every section of this topic.

.Net Hotfixes that Support Availability Groups

Depending on the SQL Server 2017 components and features you will use with Always On availability groups, you may need to install additional .Net hotfixes identified in the following table. The hotfixes can be installed in any order.

	Dependent Feature	Hotfix	Link
	Reporting Services	Hotfix for .Net 3.5 SP1 adds support to SQL Client for Always On features of Read-intent, readonly, and multisubnetfailover. The hotfix needs to be installed on each Reporting Services report server.	KB 2654347: Hotfix for .Net 3.5 SP1 to add support for Always On features

Checklist: Requirements (Windows System)

To support the Always On availability groups feature, ensure that every computer that is to participate in one or more availability groups meets the following fundamental requirements:

	Requirement	Link
	Ensure that the system is not a domain controller.	Availability groups are not supported on domain controllers.
	Ensure that each computer is running Windows Server 2012 or later versions.	Hardware and Software Requirements for Installing SQL Server 2016
	Ensure that each computer is a node in a WSFC.	Windows Server Failover Clustering (WSFC) with SQL Server
	Ensure that the WSFC contains sufficient nodes to support your availability group configurations.	A cluster node can host one replica for an availability group. The same node cannot host two replicas from the same availability group. The cluster node can participate in multiple availability groups, with one replica from each group. Ask your database administrators how many cluster nodes are required for to support the availability replicas of the planned availability groups. Overview of Always On Availability Groups (SQL Server).

Important

Also ensure that your environment is correctly configured for connecting to an availability group. For more information, see Always On Client Connectivity (SQL Server).

Recommendations for Computers That Host Availability Replicas (Windows System)

Comparable systems: For a given availability group, all the availability replicas should run on comparable systems that can handle identical workloads.
Dedicated network adapters: For best performance, use a dedicated network adapter (network interface card) for Always On availability groups.
Sufficient disk space: Every computer on which a server instance hosts an availability replica must possess sufficient disk space for all the databases in the availability group. Keep in mind that as primary databases grow, their corresponding secondary databases grow the same amount.

Permissions (Windows System)

To administer a WSFC, the user must be a system administrator on every cluster node.

For more information about the account for administering the cluster, see Appendix A: Failover Cluster Requirements.

Related Tasks (Windows System)

Task	Link
Set the HostRecordTTL value.	Change the HostRecordTTL (Using Windows PowerShell)

Change the HostRecordTTL (Using Windows PowerShell)

Open PowerShell window via Run as Administrator.
Import the FailoverClusters module.
Use the Get-ClusterResource cmdlet to find the Network Name resource, then use Set-ClusterParameter cmdlet to set the HostRecordTTL value, as follows:

Get-ClusterResource "<NetworkResourceName>" | Set-ClusterParameter HostRecordTTL <TimeInSeconds>

The following PowerShell example sets the HostRecordTTL to 300 seconds for a Network Name resource named SQL Network Name (SQL35).
```
Import-Module FailoverClusters  

$nameResource = "SQL Network Name (SQL35)"  
Get-ClusterResource $nameResource | Set-ClusterParameter ClusterParameter HostRecordTTL 300  
```
Tip

Every time you open a new PowerShell window, you need to import the FailoverClusters module.

Clustering and High-Availability (Failover Clustering and Network Load Balancing Team Blog)
Getting Started with Windows PowerShell on a Failover Cluster
Cluster resource commands and equivalent Windows PowerShell cmdlets

SQL Server Instance Prerequisites and Restrictions

Each availability group requires a set of failover partners, known as availability replicas, which are hosted by instances of SQL Server. A given server instance can be a stand-alone instance or a SQL Serverfailover cluster instance (FCI).

In This Section:

Checklist: Prerequisites (Server Instance)

	Prerequisite	Links
	The host computer must be a WSFC node. The instances of SQL Server that host availability replicas for a given availability group reside on separate nodes of the cluster. An availability group can temporarily straddle two clusters while being migrated to different cluster. SQL Server 2016 introduces distributed availability groups. In a distributed availability group two availability groups reside on different clusters.	Windows Server Failover Clustering (WSFC) with SQL Server Failover Clustering and Always On Availability Groups (SQL Server) Distributed Availability Groups (Always On Availability Groups)
	If you want an availability group to work with Kerberos: All server instances that host an availability replica for the availability group must use the same SQL Server service account. The domain administrator needs to manually register a Service Principal Name (SPN) with Active Directory on the SQL Server service account for the virtual network name (VNN) of the availability group listener. If the SPN is registered on an account other than the SQL Server service account, authentication will fail. Important If you change the SQL Server service account, the domain administrator will need to manually re-register the SPN.	Register a Service Principal Name for Kerberos Connections Brief explanation: Kerberos and SPNs enforce mutual authentication. The SPN maps to the Windows account that starts the SQL Server services. If the SPN is not registered correctly or if it fails, the Windows security layer cannot determine the account associated with the SPN, and Kerberos authentication cannot be used. Note: NTLM does not have this requirement.
	If you plan to use a SQL Server failover cluster instance (FCI) to host an availability replica, ensure that you understand the FCI restrictions and that the FCI requirements are met.	Prerequisites and Requirements on Using a SQL Server Failover Cluster Instance (FCI) to Host an Availability Replica (later in this article)
	Each server instance must be running the same version of SQL Server to participate in an Always On Availability Group.	Editions and supported features for SQL 2014, SQL 2016, SQL 2017.
	All the server instances that host availability replicas for an availability group must use the same SQL Server collation.	Set or Change the Server Collation
	Enable the Always On availability groups feature on each server instance that will host an availability replica for any availability group. On a given computer, you can enable as many server instances for Always On availability groups as your SQL Server installation supports.	Enable and Disable Always On Availability Groups (SQL Server) Important If you destroy and re-create a WSFC, you must disable and re-enable the Always On availability groups feature on each server instance that was enabled for Always On availability groups on the original cluster.
	Each server instance requires a database mirroring endpoint. Note that this endpoint is shared by all the availability replicas and database mirroring partners and witnesses on the server instance. If a server instance that you select to host an availability replica is running under a domain user account and does not yet have a database mirroring endpoint, the New Availability Group Wizard (or Add Replica to Availability Group Wizard) can create the endpoint and grant CONNECT permission to the server instance service account. However, if the SQL Server service is running as a built-in account, such as Local System, Local Service, or Network Service, or a nondomain account, you must use certificates for endpoint authentication, and the wizard will be unable to create a database mirroring endpoint on the server instance. In this case, we recommend that you create the database mirroring endpoints manually before you launch the wizard. Security Note Transport security for Always On availability groups is the same as for database mirroring.	The Database Mirroring Endpoint (SQL Server) Transport Security for Database Mirroring and Always On Availability Groups (SQL Server)
	If any databases that use FILESTREAM will be added to an availability group, ensure that FILESTREAM is enabled on every server instance that will host an availability replica for the availability group.	Enable and Configure FILESTREAM
	If any contained databases will be added to an availability group, ensure that the contained database authentication server option is set to 1 on every server instance that will host an availability replica for the availability group.	contained database authentication Server Configuration Option Server Configuration Options (SQL Server)

Thread Usage by Availability Groups

Always On availability groups has the following requirements for worker threads:

On an idle instance of SQL Server, Always On availability groups uses 0 threads.
The maximum number of threads used by availability groups is the configured setting for the maximum number of server threads ('max worker threads') minus 40.
The availability replicas hosted on a given server instance share a single thread pool.

Threads are shared on an on-demand basis, as follows:
- Typically, there are 3-10 shared threads, but this number can increase depending on the primary replica workload.
- If a given thread is idle for a while, it is released back into the general SQL Server thread pool. Normally, an inactive thread is released after ~15 seconds of inactivity. However, depending on the last activity, an idle thread might be retained longer.
- A SQL Server instance uses up to 100 threads for parallel redo for secondary replicas. Each database uses up to one-half of the total number of CPU cores, but not more than 16 threads per database. If the total number of required threads for a single instance exceeds 100, SQL Server uses a single redo thread for every remaining database. Serial Redo threads are released after ~15 seconds of inactivity.
Note

Databases are chosen to go single-threaded based on their ascending database ID. As such, the database creation order should be considered for SQL Server instances that host more availability group databases than available worker threads. For example, on a system with 32 or more CPU cores, the first six databases (ordered by database ID) in an availability group or groups will use parallel redo mode, and all subsequent databases will use single redo mode.
In addition, availability groups use unshared threads, as follows:
- Each primary replica uses 1 Log Capture thread for each primary database. In addition, it uses 1 Log Send thread for each secondary database. Log send threads are released after ~15 seconds of inactivity.
- A backup on a secondary replica holds a thread on the primary replica for the duration of the backup operation.

For more information, see Always On - HADRON Learning Series: Worker Pool Usage for HADRON Enabled Databases (a CSS SQL Server Engineers Blog).

Permissions (Server Instance)

Task	Required Permissions
Creating the database mirroring endpoint	Requires CREATE ENDPOINT permission, or membership in the sysadmin fixed server role. Also requires CONTROL ON ENDPOINT permission. For more information, see GRANT Endpoint Permissions (Transact-SQL).
Enabling Always On availability groups	Requires membership in the Administrator group on the local computer and full control on the WSFC.

Related Tasks (Server Instance)

Task	Article
Determining whether database mirroring endpoint exists	sys.database_mirroring_endpoints (Transact-SQL)
Creating the database mirroring endpoint (if it does not yet exist)	Create a Database Mirroring Endpoint for Windows Authentication (Transact-SQL) Use Certificates for a Database Mirroring Endpoint (Transact-SQL) Create a Database Mirroring Endpoint for Always On Availability Groups (SQL Server PowerShell)
Enabling Availability Groups	Enable and Disable Always On Availability Groups (SQL Server)

Network Connectivity Recommendations

We strongly recommend that you use the same network links for communications between WSFC nodes and communications between availability replicas. Using separate network links can cause unexpected behaviors if some of links fail (even intermittently).

For example, for an availability group to support automatic failover, the secondary replica that is the automatic-failover partner must be in the SYNCHRONIZED state. If the network link to this secondary replica fails (even intermittently), the replica enters the UNSYNCHRONIZED state and cannot begin to resynchronize until the link is restored. If the WSFC requests an automatic failover while the secondary replica is unsynchronized, automatic failover will not occur.

Client Connectivity Support

For information about Always On availability groups support for client connectivity, see Always On Client Connectivity (SQL Server).

Prerequisites and Restrictions for Using a SQL Server Failover Cluster Instance (FCI) to Host an Availability Replica

In This Section:

Restrictions (FCIs)

Note

Failover Cluster Instances supports Clustered Shared Volumes (CSV). For more information on CSV, see Understanding Cluster Shared Volumes in a Failover Cluster.

The cluster nodes of an FCI can host only one replica for a given availability group: If you add an availability replica on an FCI, the WSFC nodes that are possible FCI owners cannot host another replica for the same availability group. To avoid possible conflicts, it is recommended to configure possible owners for the failover cluster instance. This will prevent potentially causing a single WSFC from attempting to host two availability replicas for the same availability group.

Furthermore, every other replica must be hosted by an instance of SQL Server 2016 that resides on a different cluster node in the same Windows Server failover cluster. The only exception is that while being migrated to another cluster, an availability group can temporarily straddle two clusters.

Warning

Using the Failover Cluster Manager to move a failover cluster instance hosting an availability group to a node that is already hosting a replica of the same availability group may result in the loss of the availability group replica, preventing it from being brought online on the target node. A single node of a failover cluster cannot host more than one replica for the same availability group. For more information on how this occurs, and how to recover, see the blog Replica unexpectedly dropped in availability group.

FCIs do not support automatic failover by availability groups: FCIs do not support automatic failover by availability groups, so any availability replica that is hosted by an FCI can be configured for manual failover only.
Changing FCI network name: If you need to change the network name of an FCI that hosts an availability replica, you will need to remove the replica from its availability group and then add the replica back into the availability group. You cannot remove the primary replica, so if you are renaming an FCI that is hosting the primary replica, you should fail over to a secondary replica and then remove the former primary replica and add it back. Note that renaming an FCI might alter the URL of its database mirroring endpoint. When you add the replica ensure that you specify the current endpoint URL.

Checklist: Prerequisites (FCIs)

	Prerequisite	Link
	Ensure that each SQL Server failover cluster instance (FCI) possesses the required shared storage as per standard SQL Server failover cluster instance installation.

Related Tasks (FCIs)

Task	Article
Installing a SQL Server Failover Cluster	Create a New SQL Server Failover Cluster (Setup)
In-place upgrade of your existing SQL Server Failover Cluster	Upgrade a SQL Server Failover Cluster Instance (Setup)
Maintaining your existing SQL Server Failover Cluster	Add or Remove Nodes in a SQL Server Failover Cluster (Setup)

Sunday, September 29, 2019

Always On Failover Cluster Instances (SQL Server)

Benefits of a Failover Cluster Instance

Recommendations

Failover Cluster Instance Overview

Predictable Failover Time

Reliable Health Monitoring and Flexible Failover Policy

Elements of a Failover Cluster Instance

SQL Server Failover Concepts and Tasks

Windows Server Failover Clustering with SQL Server

Terms and Definitions

Overview of Windows Server Failover Clustering

SQL Server Always On Technologies and WSFC

Instance-level High Availability with Always On Failover Cluster Instances

Database-level High Availability with Always On availability groups

WSFC Health Monitoring and Failover

Failover Policies for Nodes, Failover Cluster Instances, and Availability Groups

WSFC Resource Health Detection

WSFC Inter-node Health Detection and Quorum Voting

Disaster Recovery Through Forcing Quorum

Relationship of SQL Server AlwaysOn Components to WSFC

Prerequisites, Restrictions, and Recommendations for Always On availability groups

.Net Hotfixes that Support Availability Groups

Checklist: Requirements (Windows System)

Recommendations for Computers That Host Availability Replicas (Windows System)

Permissions (Windows System)

Related Tasks (Windows System)

Change the HostRecordTTL (Using Windows PowerShell)

Related Content (PowerShell)

Related Content (Windows System)

SQL Server Instance Prerequisites and Restrictions

Checklist: Prerequisites (Server Instance)

Thread Usage by Availability Groups

Permissions (Server Instance)

Related Tasks (Server Instance)

Related Content (Server Instance)

Network Connectivity Recommendations

Client Connectivity Support

Prerequisites and Restrictions for Using a SQL Server Failover Cluster Instance (FCI) to Host an Availability Replica

Restrictions (FCIs)

Checklist: Prerequisites (FCIs)

Related Tasks (FCIs)

Lab 09: Publish and subscribe to Event Grid events