Using custom SQL performance counters to monitor SQL Server

DBAs often find themselves tasked with creating custom SQL Server performance counters to meet the increasing complexity of their environments and monitoring requirements.

ApexSQL Monitor is a SQL server performance monitoring and diagnostic application suited for monitoring system, SQL Server, and database performance in real time using 40+ counters.

While ApexSQL Monitor as effectively curated a wide range of the common SQL Server and Operating System performance metrics, it also has the ability to allow for the creation of custom SQL performance counters for specific configuration of system on-demand monitoring, historical trending and alerting.

Custom metrics are T-SQL based queries that can be added to ApexSQL Monitor and data collected by those queries can be analyzed and alerted on, just like any other built-in metrics. The custom metric T-SQL script added in ApexSQL monitor must return a single numerical value.

Custom metrics basically allow users to greatly extend the capability of ApexSQL Monitor from it’s out of the box configuration and set of standard metrics.

The following are examples of custom SQL performance counters that may be useful.

Signal waits percent

The signal wait time can be obtained from the signal_wait_time_ms column of the sys.dm_os_wait_stats DMV. This metric is focused to CPU wait time. In situations when the query must wait until another resource becomes available, or when it has to accomplish I/O, it will be pushed to the wait list. The time spent in waiting is stored in the wait_time_ms column. Signal waits are always present in ILTP systems considering that such a system is based on large number of short transactions.

To track the CPU pressure that can be caused by such transactions, this common query calculates signal waits as a percentage of total wait time.

SELECT CAST(100.0 * SUM(signal_wait_time_ms) / SUM(wait_time_ms) 
AS NUMERIC(20, 2))
FROM sys.dm_os_wait_stats;

But this query actually doesn’t reflect the appropriate signal wait percent, as it includes wait stats for which it is normal to have longer wait time and/or do not commonly affect the performance of SQL Server. Excluding these wait stats from the calculation, as shown below will return much more realistic measurements.

Name – Signal waits percentage
Performance category – SQL Server performance
SQL Server version – SQL Server 2005 to SQL Server 2016
Unit – Percent
Use percent scale for chart (0-100) – Yes
Period – 60 (can be set according to user preferences and aimed precision of reading)
Counter type – Execution value
Query text:

SELECT CAST(100.0 * SUM(signal_wait_time_ms) / SUM(wait_time_ms) 
AS NUMERIC(20, 2))
FROM sys.dm_os_wait_stats;
WHERE wait_type NOT IN (
  'BROKER_TASK_STOP', 'CLR_MANUAL_EVENT', 'CLR_AUTO_EVENT', 'DISPATCHER_QUEUE_SEMAPHORE',
  'FT_IFTS_SCHEDULER_IDLE_WAIT', 'XE_DISPATCHER_WAIT', 'XE_DISPATCHER_JOIN', 'BROKER_EVENTHANDLER',
  'SLEEP_BPOOL_FLUSH', 'SQLTRACE_LOCK', 'SP_SERVER_DIAGNOSTICS_SLEEP',
  'DISPATCHER_QUEUE_SEMAPHORE', 'FT_IFTS_SCHEDULER_IDLE_WAIT', 'DIRTY_PAGE_POLL',
  'HADR_FILESTREAM_IOMGR_IOCOMPLETION', 'XE_LIVE_TARGET_TVF',
  'TRACEWRITE', 'FT_IFTSHC_MUTEX', 'SQLTRACE_INCREMENTAL_FLUSH_SLEEP',
  'CLR_SEMAPHORE', 'LAZYWRITER_SLEEP', 'RESOURCE_QUEUE', 'SLEEP_TASK',
  'SLEEP_SYSTEMTASK', 'SQLTRACE_BUFFER_FLUSH', 'WAITFOR', 'LOGMGR_QUEUE',
  'CHECKPOINT_QUEUE', 'REQUEST_FOR_DEADLOCK_SEARCH', 'XE_TIMER_EVENT', 'BROKER_TO_FLUSH',
  'BROKER_RECEIVE_WAITFOR', 'ONDEMAND_TASK_QUEUE', 'DBMIRROR_EVENTS_QUEUE',
  'DBMIRRORING_CMD', 'BROKER_TRANSMITTER', 'SQLTRACE_WAIT_ENTRIES'
  )
  AND wait_time_ms <> 0 OPTION(RECOMPILE)

SQL compilations per Batch requests

Having high number of SQL compilations/sec doesn’t mean by itself that there is something wrong with SQL Server. For example, if SQL compilations/sec returns a value of 200, then it is important to know how that number relates to Batch requests/sec. If there are 500 Batch requests/sec, then the number of SQL compilations/sec looks quite high and some drilling down to address the root cause of that is required (usually it is an overuse of ad-hoc queries and single-use plans). On the other hand, if the Batch requests/sec returns for example 6,000, then it is obvious that 200 SQL Compilations/sec will not cause any pressure and the number may be disregarded.

A general rule is that SQL compilations per Batch requests should not be higher than 10%, but this shouldn’t always be accepted on face value as the “bad” value could significantly depend on the system configuration, i.e. whether the batches consists of multiple statements, whether an ORM is in use or a lot of highly variable dynamic SQL queries. In such cases it is quite possible that even a much higher percent could be considered as normal.

Name – SQL compilations per Batch requests
Performance category – SQL Server performance
SQL Server version – SQL Server 2005 or higher
Unit – Percent
Use percent scale for chart (0-100) – Yes
Period – 60 (can be set according to user preferences and aimed precision of reading)
Counter type – Execution value
Query text:

SELECT 100 * cntr_value / (
		SELECT 1.0 * cntr_value
		FROM sys.dm_os_performance_counters
		WHERE counter_name = 'Batch Requests/sec'
		)
FROM sys.dm_os_performance_counters
WHERE counter_name = 'SQL Compilations/sec';

Total number of deadlocks since last restart

Sometime there is a need in the system where deadlocks are causing a lot of troubles, to have a global insight in the deadlocks number as well as the deadlock spreads during the different times of day/week. The following custom SQL performance counter will allow tracking the total number of deadlocks during the time since last SQL Server restart.

To set the Total number of deadlocks custom SQL performance counter using ApexSQL Monitor, set the parameters as follow:

Name – SQL compilations per Batch requests
Performance category – SQL Server performance
SQL Server version – SQL Server 2008 and higher
Unit – deadlocks
Use percent scale for chart (0-100) – No
Period – 3600 (can be set according to user preferences and aimed precision of reading)
Counter type – Execution value
Query text:

SELECT ISNULL(SUM(cntr_value), 0) AS res
FROM sys.dm_os_performance_counters
WHERE LTRIM(RTRIM(OBJECT_NAME)) LIKE '%:Locks'
	AND counter_name = 'Number of Deadlocks/sec'
	AND instance_name = '_Total'

Buffer manager page reads per second

This metric indicates the amount of work needed by the buffer manager to flush out obsolete/aged pages from storage. In essence, it returns the number of physical database reads from disk per second. As a larger amount of work is need to flush out aged pages, the larger will be the number of reads from the disk and thus the bigger the impact. Even when latencies of the disk are very low, as it can cause a large waits as well as indexing or memory constraint.

The threshold for this custom SQL performance counter is around 80-90 reads/sec, as a rule of thumb, but it should be adjusted according to the specific system.

Name – Buffer manager page reads per second
Performance category – SQL Server performance
SQL Server version – SQL Server 2008 and higher
Unit – reads/second
Use percent scale for chart (0-100) – No
Period – 60 (can be set according to user preferences and aimed precision of reading)
Counter type – Execution value
Query text:

SELECT cntr_value
FROM sys.dm_os_performance_counters
WHERE (RTRIM(LTRIM(counter_name))) LIKE 'Page reads/sec';

Ad Hoc queries executed once

This custom SQL performance counter will return the number of ad hoc queries executed only once, and are present in the execution plan cache at the moment of reading (together with stub and non-stub compiled ones). A large number of ad hoc queries executed only one time each could indicate execution plan cache inflation. This is a situation where memory of the execution plan cache is wasted, as it will store execution plans for queries that will be executed only one time and never again.

This is not a strict metric as it is possible that most one-time executed ad hoc queries wind up being executed on subsequent occasions. Due to variance in values, it is advisable to calculate baselines for this custom SQL performance counter as it will reflect more closely the typical values per each monitored SQL Server instance.

In case that a large number is returned over time (over 8,000 for example), it is likely that the execution plan cache is inflated. In such cases, turning on “optimize for ad hoc workloads” should prevent this from happening.

Quick tip icon

Quick tip:

Before making a decision to turn on “optimize for ad hoc workloads”, take in consideration the Memory usage of ad hoc queries executed once. When both values are high, then it is almost certain that execution plan cash is suffering from inflation

Name – Ad Hoc queries executed once
Performance category – SQL Server performance
SQL Server version – SQL Server 2008 and higher
Unit – Queries
Use percent scale for chart (0-100) – No
Period – 60 (can be set according to user preferences and aimed precision of reading)
Counter type – Execution value
Query text:

SELECT SUM(CASE 
			WHEN usecounts = 1
				THEN 1
			ELSE 0
			END)
FROM sys.dm_exec_cached_plans
WHERE objtype = 'Adhoc'
GROUP BY objtype;

Memory usage for ad hoc queries executed once

This custom SQL performance counter returns the amount of used memory by all ad hoc queries executed once and stored in the execution plan cache. The returned value is valid only at the moment of reading and there can be some significant differences between two consecutive metric readings. The larger the number that is returned, the more memory is wasted in the execution plan cache on queries that will be executed only once, which can put pressure on RAM. It is always better having RAM dedicated to the data cache as this improves the SQL Server performance.

The same as the Ad Hoc queries executed once metric, this is not a steady metric and for this custom SQL metric is also recommended to calculate baseline.

In case that a large number of is returned over the time (over the 400 MB for example) than it is likely that the execution plan cache is bloated, and in such case turning on “optimize for ad hoc workloads” should prevent inflating the plan cache, thus increasing the SQL Server performance.

Name – Memory usage of ad hoc queries executed once
Performance category – SQL Server performance
SQL Server version – SQL Server 2008 and higher
Unit – MB
Use percent scale for chart (0-100) – No
Period – 60 (can be set according to user preferences and aimed precision of reading)
Counter type – Execution value
Query text:

SELECT SUM(CAST((
				CASE 
					WHEN usecounts = 1
						THEN size_in_bytes
					ELSE 0
					END
				)AS DECIMAL(18, 2) ) ) / 1024 / 1024
FROM sys.dm_exec_cached_plans
WHERE objtype = 'Adhoc'
GROUP BY objtype;

Oldest active transaction (per SQL Server)

This custom SQL performance counter will return the current execution time for the oldest active transaction on the monitored SQL Server instance. An uncommitted transaction can cause excessive transaction log growth even if the transaction log backup is performed regularly. The backup of the transaction log can truncate just the part of the transaction log that is not active, which mean that an open transaction can lead to transaction log growth up to its physical limits unless the transaction is committed, rolled back or killed by user. In addition, an open transaction can block other processes on SQL Server to gain access and made changes on the data locked by the open transaction.

The alert threshold cannot be clearly defined here and might depend on the environment, so it should be set by DBA to be triggered on maximum transaction time expected on the system.

Name – Oldest active transaction (per SQL Server)
Performance category – SQL Server performance
SQL Server version – SQL Server 2005 and higher
Unit – Seconds
Use percent scale for chart (0-100) – No
Period – 120-240 (can be set differently according to expected max transaction time)
Counter type – Execution value
Query text:

SELECT
          DATEDIFF(ss, CAST([database_transaction_begin_time] AS DATETIME ),
	GETDATE()
          FROM
          sys.dm_tran_database_transactions [s_tdt]
WHERE [s_tdt].[database_transaction_begin_time] IS NOT NULL

Oldest active transaction (per database)

This is a similar custom SQL performance counter to the previous, but designed for those who have a system with higher activity and want to monitor the oldest active transaction for each database on a monitored SQL Server instance. This metric utilizes ApexSQL Monitor’s ability to execute the custom query for every database automatically, using the %Database% variable,set as the part of the query and replaced with the adequate database name during execution. The charts will be displayed in the Database performance tab.

Name – Oldest active transaction (per SQL Server)
Performance category – Database performance
SQL Server version – SQL Server 2005 and higher
Unit – Seconds
Use percent scale for chart (0-100) – No
Period – 120-240 (can be set differently according to expected max transaction time)
Counter type – Execution value
Query text:

SELECT TOP 1 DATEDIFF(ss, CAST(database_transaction_begin_time AS datetime), GETDATE())
FROM sys.dm_tran_database_transactions
WHERE database_transaction_begin_time IS NOT NULL
	AND DB_NAME(database_id) = '%Database%'
ORDER BY database_transaction_begin_time;

Additional resources

May 19, 2016