A checkpoint creates a known good point from which the SQL Server Database Engine can start applying changes contained in the log during recovery after an unexpected shutdown or crash.
Overview
For performance reasons, the Database Engine performs modifications to database pages in memory-in the buffer cache-and does not write these pages to disk after every change. Rather, the Database Engine periodically issues a checkpoint on each database.
A checkpoint writes the current in-memory modified pages (known as dirty pages) and transaction log information from memory to disk and, also records the information in the transaction log.
The Database Engine supports several types of checkpoints: automatic, indirect, manual, and internal. The following table summarizes the types of checkpoints:
Note
The -k SQL Server advanced setup option enables a database administrator to throttle checkpoint I/O behavior based on the throughput of the I/O subsystem for some types of checkpoints. The -k setup option applies to automatic checkpoints and any otherwise unthrottled manual and internal checkpoints.
For automatic, manual, and internal checkpoints, only modifications made after the latest checkpoint need to be rolled forward during database recovery. This reduces the time required to recover a database.
Important
Long-running, uncommitted transactions increase recovery time for all checkpoint types.
Interaction of the TARGET_RECOVERY_TIME and 'recovery interval' Options
The following table summarizes the interaction between the server-wide sp_configure'recovery interval' setting and the database-specific ALTER DATABASE ... TARGET_RECOVERY_TIME setting.
Automatic checkpoints
An automatic checkpoint occurs each time the number of log records reaches the number the Database Engine estimates it can process during the time specified in the recovery interval server configuration option.
In every database without a user-defined target recovery time, the Database Engine generates automatic checkpoints. The frequency depends on the recovery interval advanced server configuration option, which specifies the maximum time that a given server instance should use to recover a database during a system restart.
The Database Engine estimates the maximum number of log records it can process within the recovery interval. When a database using automatic checkpoints reaches this maximum number of log records, the Database Engine issues an checkpoint on the database.
The time interval between automatic checkpoints can be highly variable. A database with a substantial transaction workload will have more frequent checkpoints than a database used primarily for read-only operations. Under the simple recovery model, an automatic checkpoint is also queued if the log becomes 70 percent full.
Under the simple recovery model, unless some factor is delaying log truncation, an automatic checkpoint truncates the unused section of the transaction log. By contrast, under the full and bulk-logged recovery models, once a log backup chain has been established, automatic checkpoints do not cause log truncation.
After a system crash, the length of time required to recover a given database depends largely on the amount of random I/O needed to redo pages that were dirty at the time of the crash. This means that the recovery interval setting is unreliable. It cannot determine an accurate recovery duration. Furthermore, when an automatic checkpoint is in progress, the general I/O activity for data increases significantly and quite unpredictably.
Impact of recovery interval on recovery performance
For an online transaction processing (OLTP) system using short transactions, recovery interval is the primary factor determining recovery time. However, the recovery interval option does not affect the time required to undo a long-running transaction. Recovery of a database with a long-running transaction can take much longer than the time specified in the recovery interval setting.
For example, if a long-running transaction took two hours to perform updates before the server instance became disabled, the actual recovery takes considerably longer than the recovery interval value to recover the long transaction.
Typically, the default values provides optimal recovery performance. However, changing the recovery interval might improve performance in the following circumstances:
- If recovery routinely takes significantly longer than 1 minute when long-running transactions are not being rolled back.
- If you notice that frequent checkpoints are impairing performance on a database.
If you decide to increase the recovery interval setting, we recommend increasing it gradually by small increments and evaluating the effect of each incremental increase on recovery performance.
This approach is important because as the recovery interval setting increases, database recovery takes that many times longer to complete. For example, if you change recovery interval to 10 minutes, recovery takes approximately 10 times longer to complete than when recovery interval is set to 1 minute.
Indirect checkpoints
Indirect checkpoints, introduced in SQL Server 2012 (11.x), provide a configurable database-level alternative to automatic checkpoints. This can be configured by specifying the target recovery time database configuration option.
In the event of a system crash, indirect checkpoints provide potentially faster, more predictable recovery time than automatic checkpoints. Indirect checkpoints offer the following advantages:
- An online transactional workload on a database configured for indirect checkpoints can experience performance degradation. Indirect checkpoints ensure that the number of dirty pages are below a certain threshold so the database recovery completes within the target recovery time.
The recovery interval configuration option uses the number of transactions to determine the recovery time, as opposed to indirect checkpoints which makes use of the number of dirty pages.
When indirect checkpoints are enabled on a database receiving a large number of DML operations, the background writer can start aggressively flushing dirty buffers to disk to ensure that the time required to perform recovery is within the target recovery time set of the database.
This can cause additional I/O activity on certain systems which can contribute to a performance bottleneck if the disk subsystem is operating above or nearing the I/O threshold.
- Indirect checkpoints enable you to reliably control database recovery time by factoring in the cost of random I/O during REDO. This enables a server instance to stay within an upper-bound limit on recovery times for a given database (except when a long-running transaction causes excessive UNDO times).
- Indirect checkpoints reduce checkpoint-related I/O spiking by continually writing dirty pages to disk in the background.
However, an online transactional workload on a database configured for indirect checkpoints can experience performance degradation. This is because the background writer used by indirect checkpoint sometimes increases the total write load for a server instance.
Important
Indirect checkpoint is the default behavior for new databases created in SQL Server 2016 (13.x), including the Model and TempDB databases.
Databases that were upgraded in-place, or restored from a previous version of SQL Server, will use the previous automatic checkpoint behavior unless explicitly altered to use indirect checkpoint.
Databases that were upgraded in-place, or restored from a previous version of SQL Server, will use the previous automatic checkpoint behavior unless explicitly altered to use indirect checkpoint.
Improved indirect checkpoint scalability
Prior to SQL Server 2019, you may experience non-yielding scheduler errors when there is a database that generates a large number of dirty pages, such as
tempdb
. SQL Server 2019 introduces improved scalability for indirect checkpoint, which should help avoid these errors on databases that have a heavy UPDATE
/INSERT
workload.Internal checkpoints
Internal Checkpoints are generated by various server components to guarantee that disk images match the current state of the log. Internal checkpoint are generated in response to the following events:
- Database files have been added or removed by using ALTER DATABASE.
- A database backup is taken.
- A database snapshot is created, whether explicitly or internally for DBCC CHECKDB.
- An activity requiring a database shutdown is performed. For example, AUTO_CLOSE is ON and the last user connection to the database is closed, or a database option change is made that requires a restart of the database.
- An instance of SQL Server is stopped by stopping the SQL Server (MSSQLSERVER) service . Either action causes a checkpoint in each database in the instance of SQL Server.
- Bringing a SQL Server failover cluster instance (FCI) offline.
No comments:
Post a Comment