Even a superficial reading of books and articles on SQL Server performance monitoring will tell you that SQL Server memory pressure is the first thing you should monitor. Not the only thing, far from it, and some CPU, I/O and internal SQL Server counters are critically important to get the complete picture but it’s the memory counters that often show us the first indication of suboptimal performance
Memory and the way SQL Server uses it, is right on a sort of tripoint where SQL Server, CPU and I/O meet and memory pressure may indicate missing indexes, badly designed tables, an actual lack of memory on the server for the workload it’s been tasked with, other applications running on the system, users logging onto the server and so on. Conversely, allowing SQL Server to use more memory can help cover up lots of these and other issues [10]
Most important memory measurements deal with SQL Server’s buffer pool, an internal cache of database pages, SQL Server’s memory and the system’s memory. These measurements are:
- Page Life Expectancy
- Buffer Cache hit ratio
- Page reads / sec
- Page writes / sec
- Lazy writes / sec
- Memory Grants Pending
- Total Server Memory
- Target Server Memory
- Available Mbytes
- Pages / sec
- Paging File % Usage
SQL Server’s Buffer Manager Measurements
Page Life Expectancy
By far the most cited performance counter is Page Life Expectancy (aka PLE). This counter is part of SQL Server’s Buffer Manager Object and MSDN defines [1] it as “the number of seconds a page will stay in the buffer pool without references.” Even though it doesn’t say so in its definition nor in its name, this counter actually shows the average, not some total or min/max, of the lifetime of pages in the buffer pool. So this counter indicates just how long, in seconds and on average, the pages are staying in the buffer pool and the longer an average page is living in the buffer pool, the higher the chances are that SQL Server will find it there when eventually it needs it again instead of having to read it from the disk
The relevance of monitoring this counter cannot be overstated, as a low value for it indicates that a SQL Server memory pressure, as explained above, must necessarily lead to more I/O which is, of course, the slowest component in the stack by several orders of magnitude. As an example a missing index on a large table (table for which pages cannot fit into RAM available for SQL Server’s buffer pool) will eventually lead to a full scan which will of course lead to increased reading (problem) and flushing of potentially useful pages of other tables (aggravating problem). Such an event would be easily detectable by following the page life expectancy counter as its value would necessarily drop
Several authorities on the subject, including SQL Server’s Customer Advisory Team (CAT) [3], indicated that the value for this counter should be at least 300 for OLTP applications. SQL Server’s CAT also indicated that values for this counter should never quickly drop by 50% or more
Today the value 300 is too small to be used as a threshold. This value was determined when servers generally had 4GB of RAM installed. Today, it is not uncommon for a SQL Servers to have more than 48GB of RAM. [12] Instead the old 300 threshold, an adaptive formula can be used to get a threshold estimate:
MAXBP(MB)/1024/4*300
MAXBP(MB) represents the maximum amount of Buffer Pool memory. SQL Server:Buffer Manager:Database Pages counter can be used to get number of pages which should be converted to MB: (pages*8)/1024 and used as MAXBP(MB) value. The above formula considers each 4GB of Buffer Pool (per NUMA node or otherwise) to have its own 300 second Page Life Expectancy. [13]
Besides the formula, the baseline value can be determined to react when Page Life Expectancy measured value is below that baseline.
To query the value of this counter run the following:
SELECT [cntr_value] FROM sys.dm_os_performance_counters WHERE [object_name] LIKE '%Buffer Manager%' AND [counter_name] = 'Page life expectancy'
Buffer Cache hit ratio
Tightly related to Page Life Expectancy is Buffer Cache hit ratio performance counter. This counter is also part of SQL Server’s Buffer Manager object and MSDN defines it as:
…the percentage of pages found in the buffer cache without having to read from disk. The ratio is the total number of cache hits divided by the total number of cache lookups over the last few thousand page accesses.
As MSDN continues to note, you want this ratio as high as possible as, again, reading from memory is several orders of magnitude faster than reading from disks (even with SSDs in game). SQL Server’s CAT doesn’t mention it, nor gives guidelines on the actual value that this counter should have, but other sources [4] indicate that for OLTP applications it should equal or exceed 98%
To query the value of this counter run the following:
SELECT [cntr_value] FROM sys.dm_os_performance_counters WHERE [object_name] LIKE '%Buffer Manager%' AND [counter_name] = 'Buffer cache hit ratio'
Together with PLE, the Buffer Cache hit ratio can give us a critical insight into SQL Server’s buffer pool, the subsystem of SQL Server that uses most of its memory and that is fundamental in obtaining optimal performance
Page reads / sec, Page writes / sec and Lazy writes / sec Measurements
Continuing with Buffer Cache counters, we have the number of read and written database pages per second together with Lazy writes / sec. Taken together these three measurements may give good indication on memory contention (pressure) and possible indexing issues (specific SQL Server counters that may reflect on database design issues will be detailed in a separate article)
MSDN [1] defines Page reads / sec as:
the number of physical database page reads that are issued per second. This statistic displays the total number of physical page reads across all databases. Because physical I/O is expensive, you may be able to minimize the cost, either by using a larger data
Page writes / sec is defined as:
Indicates the number of physical database page writes that are issued per second
Of course, Page writes / sec is also a server-level measurement showing page writes across all databases. Lazy writes / sec is defined as:
the number of buffers written per second by the buffer manager’s lazy writer. The lazy writer is a system process that flushes out batches of dirty, aged buffers (buffers that contain changes that must be written back to disk before the buffer can be reused
Regarding guidelines, the number of lazy writes shouldn’t exceed 20 on average systems [7] [9] and should ideally be close to zero which would indicate a sufficiently large buffer pool. There are no general guidelines on reads and writes figures so we should establish a baseline values for them prior to any troubleshooting. The Accidental DBA [8] mentions this rule of thumb:
If Lazy writes/sec is consistently experiencing non-zero values with a low PLE and elevated values for Page reads/sec and Page writes/sec the server is experiencing buffer pool contention and you will need to go about troubleshooting this problem further
To query the values of these counters run the following:
SELECT [counter_name], [cntr_value] FROM sys.dm_os_performance_counters WHERE [object_name] LIKE '%Buffer Manager%' AND [counter_name] IN ('Page reads/sec', 'Page writes/sec', 'Lazy writes/sec')
SQL Server’s Memory Manager Measurements
Memory Grants Pending
This SQL Server performance counter is, obviously, available in its Memory Manager object and is defined by MSDN [2] as “the total number of processes waiting for a workspace memory grant” where by processes is referred to SQL Server’s internal system processes (like sessions and the likes of the aforementioned lazy writer)
SQL Server Customer Advisory Team [3] recommends that the value of this measurement should always be less or equal to 1. Anything above that indicates that there are processes waiting for memory and that you should probably increase the memory allocated to SQL Server
To query the value of this counter run the following:
SELECT [cntr_value] FROM sys.dm_os_performance_counters WHERE [object_name] LIKE '%Memory Manager%' AND [counter_name] = 'Memory Grants Pending'
To find out which queries are currently waiting on a memory grant, run the following query:
SELECT * FROM sys.dm_exec_query_memory_grants WHERE grant_time IS NULL
Total Server Memory and Target Server Memory
This pair of performance counters is also part of SQL Server’s Memory Manager Object. Total Server Memory is defined by MSDN as “the amount of memory the server has committed using the memory manager” while Target Server Memory is defined as “the ideal amount of memory the server can consume”
Upon staring SQL Server its total memory will be low and it will grow throughout the warm-up period while SQL Server is bringing pages into its buffer pool and until it reaches a steady state. When this steady state is reached the total server memory will ideally be around the value of the target memory. Once the steady state is reached though, the Total Server Memory measurement should not decrease importantly as that would indicate that SQL Server is being forced to dynamically deallocate its memory due to system-level memory pressure
Since Target Server Memory is actually equal to the current value of Max Server Memory setting, we can follow suggestions available in for example “Suggested Max Memory Settings for SQL Server 2005/2008” [11]
To query these two counters run the following:
SELECT [counter_name], [cntr_value] FROM sys.dm_os_performance_counters WHERE [object_name] LIKE '%Memory Manager%' AND [counter_name] IN ('Total Server Memory (KB)', 'Target Server Memory (KB)')
From here you can calculate the ratio in percentages of these two counters by running the following query:
SELECT ROUND(100.0 * ( SELECT CAST([cntr_value] AS FLOAT) FROM sys.dm_os_performance_counters WHERE [object_name] LIKE '%Memory Manager%' AND [counter_name] = 'Total Server Memory (KB)' ) / ( SELECT CAST([cntr_value] AS FLOAT) FROM sys.dm_os_performance_counters WHERE [object_name] LIKE '%Memory Manager%' AND [counter_name] = 'Target Server Memory (KB)') , 2)AS [Ratio]
As indicated above this measurement should be at around 100% once the steady state is reached
Operating System’s Memory Measurements
Available Mbytes
Available Mbytes (and its more granular kin “Available Bytes” and “Available Kbytes”) counter is a system-level counter that indicates how much free memory is available in the system. It is part of Memory performance counter category
There are two things to pay attention to with this measurement:
- The values for it should never get too low and while what “too low” means will depend on your system/server/applications/databases there are some recommendations [11] that it shouldn’t drop below 2 to 4GB for today’s average production servers
- It should never importantly fluctuate as that could indicate the SQL Server is dynamically adjusting its memory use and/or that there are other applications/subsystems running on the database server [5]
The value of this counter, as with other operating system counters, cannot be queried through SQL Server but has to be monitored through Windows’ Performance Monitor
Pages / sec
This operating system counter indicates the number of pages read from or written to a disk due to hard page faults. Hard page faults happen whenever the operating system cannot satisfy a memory page request from the pages available in RAM and has to dip into the I/O subsystem and read the page or pages from the disk. Of course, this measurement shows all hard page faults per second, not just those from SQL Server, but it’s still an important measurement of general memory and I/O pressure on the system
Regarding the guidelines for the values of this measurement, there is a wide range of acceptable values [6] [7] and of course ideally this number would tend toward zero. The best recommendation here is to measure a baseline value when the system is functioning optimally and then compare all later measurements with that baseline value. Whenever the later measurements are significantly greater than the baseline value, it may indicate a memory pressure problem
Paging File % Usage
Last but not least is the Paging File % Usage counter which, rather obviously, indicates the percentage of the paging file that is currently being used by the operating system. Paging file is used whenever the operating system is running out of available RAM and cannot satisfy the current memory needs of all the processes running on it. Whenever that occurs, the system “pages” some of its memory pages into the paging file. This is a performance problem as writing a memory page to drive and reading it back once it’s needed is, of course, orders of magnitude slower than just being able to keep that page in memory the whole time
Guideline on the values for this measurements, especially for SQL Server dedicated machines, is 0 [5] and anything above it, without wishing to again restate the obvious, is a sure sign of a memory pressure on the system level
References
[1] MSDN article “SQL Server, Buffer Manager Object” (http://technet.microsoft.com/en-us/library/ms189628.aspx)
[2] MSDN article “SQL Server, Memory Manager Object” (http://technet.microsoft.com/en-us/library/ms190924.aspx)
[3] “OLTP Blueprint – A Performance Profile of OLTP applications” by SQL Server Customer Advisory Team (http://blogs.msdn.com/b/sqlcat/archive/2006/06/23/tom-davidson-sqlcat-best-practices.aspx)
[4] “SQL Server DMVs in Action” by Ian W. Stirk (http://www.manning.com/stirk/)
[5] “SQL Server Perfmon (Performance Monitor) Best Practices” by Brent Ozar (http://www.brentozar.com/archive/2006/12/dba-101-using-perfmon-for-sql-performance-tuning/)
[6] “The Accidental DBA (Day 21 of 30): Essential PerfMon counters” by Jonathan Kehayias (http://www.sqlskills.com/blogs/jonathan/the-accidental-dba-day-21-of-30-essential-perfmon-counters/)
[7] “SQL Server – Performance Counter Guidance” by Grumpy Old DBA (http://grumpyolddba.co.uk/monitoring/Performance%20Counter%20Guidance%20-%20SQL%20Server.htm)
[8] “A Sysadmin’s Guide to Microsoft SQL Server Memory” by Brent Ozar (http://www.brentozar.com/archive/2011/09/sysadmins-guide-microsoft-sql-server-memory/)
[9] “Suggested Max Memory Settings for SQL Server 2005/2008” by Glenn Berry (https://sqlserverperformance.wordpress.com/2009/10/29/suggested-max-memory-settings-for-sql-server-20052008/)
[10] “Finding what queries in the plan cache use a specific index” by Jonathan Kehayias (https://www.sqlskills.com/blogs/jonathan/finding-what-queries-in-the-plan-cache-use-a-specific-index/)
[11] “SQL Server Page Life Expectancy” by David Williams (http://blogs.msdn.com/b/mcsukbi/archive/2013/04/12/sql-server-page-life-expectancy.aspx)
March 3, 2015