How to Detect SQL Server Performance Issues Caused by Incorrect Clustered Indexes


A clustered index is critical to performance on any table. Not only does it dictate the logical storage of data for the entire table, but it also is the index that all others will reference when additional data is required. A poorly chosen clustered index can greatly hamper performance, causing latency, contention, and excessive IO. In this article, we will explore clustered indexes, and use ApexSQL Tools to assist in identifying those that could be improved upon.


Choosing a clustered index is an important decision on every table that we design and implement. The performance will be adversely affected if we do not implement a clustered index, or if we make a poor choice.

When initially designing a table, we need to understand its usage and how data will be inserted, updated, deleted, and read. As part of this process, we will need to choose indexes that best support any important reads against the table. Non-clustered indexes are added onto our table to support important/frequent queries and we can add many of these onto our table as time goes on. We can also remove unneeded non-clustered indexes when no longer needed.

The clustered index is not so easily changed, and therefore we want to choose it correctly the first time if at all possible. In the event that a table’s use changes greatly over time, or that poor decisions were made in its design, we need to figure out that a problem exists and respond to it as quickly as possible.

Choosing the Right Clustered Index

The clustered index should point to a unique, static key – that is, something that does not change over time and in which each row is unique. It should be the key that an application refers to most often when searching for, joining, or returning data. This likely will make the clustered index a good candidate for the primary key as well. Under the covers, SQL Server needs to logically organize data into pages based on how me implement our table. Duplicate values result in the need for SQL Server to create unique pointers in order to ensure that a query returns the correct value(s). It also means that we may have no good way to identify a single row using the clustered index alone.

In the event that a clustered index is not the primary key, I’d recommend that it at least be a unique clustered index, ensuring that there are no duplicated values. Typically, a well-designed table will have some sort of unique identifier built into it, either:

  1. A natural key that is always unique, such as an automatically generated customer number, an order number, or a ticket number.
  2. An artificial/surrogate key that is generated by SQL Server, typically an IDENTITY column. These can then be used as meaningful identifiers for customer-facing needs later on when a unique key isn’t available.

A clustered index should be static. Once a key is generated and inserted, it will not be updated over time. Deletes are acceptable as needed, but updates to the clustered index will result in additional latency. When a clustered key value is updated, all non-clustered indexes must also be updated, which will take additional time, CPU, network, and IO resources. Deletes are also expensive, so if we have any important archiving or mas-delete processes, they would best be managed during staggered/off-hours times, in order to ensure that there isn’t too much churn from the deletes at one time.

A performant clustered index will often be always-increasing over time. This means that, as new values are inserted, they are always greater than the previous value. In theory, an ever-decreasing key would also be valid (for a clustered index that is sorted in descending order), though this would likely be confusing to developers and other database professionals. A clustered index that increases over time will become less fragmented and experience far less page-splits during its regular usage. Imagine a new clustered index that has been built and is 100% in order with no gaps. When a value is inserted into the middle of the index, one of three things happens:

  1. The data is inserted into empty space on the page, if any exists.
  2. Not enough room exists: the new data is inserted, a new page is allocated, and data is shuffled between pages in an effort to even them out. In summary: we need to update two pages instead of one.

An index that increases over time will generally experience far fewer page splits (scenario #2 above) as a page will continue to be filled up until either it is full or reaches the limit that is set by an index’s fill factor.

Lastly, a good clustered index should be as narrow as possible. For a small metadata table that may never have more than a few hundred rows, a SMALLINT key is likely perfect for a numeric key. If a clustered index happens to be a composite key (made up of multiple columns), then each one should be as small as possible. We should not create extra work in the future, of course – if we know that a table will likely one day grow very large, then make sure that your key is as large as is necessary, whether SMALLINT, INT, or BIGINT. DATE, SMALLDATETIME, DATETIME, DATETIMEOFFSET, and other date/time-based columns can also make good clustered keys, assuming that values will not be duplicated. Date tables, numbers tables, warehouse time tables, and other reporting needs can make great use of date/time clustered indexes.

To summarize, a well-chosen clustered index will be:

  • Unique
  • Increasing over time
  • As narrow as possible
  • Static (No updates to the clustered key)
  • Effectively used by applications/queries

A clustered index may also be the primary key, but it does not have to. Some tables may have multiple meaningful keys and choosing a different clustered index may make sense in these scenarios. Unique non-clustered indexes are also good choices, in the event that a good clustered index has already been chosen.

Demonstrating Clustered Index Mistakes

We can demonstrate the effects of a good vs. bad clustered index using pretty much any table that already has one that is reasonably well-chosen. For this example, we will use the Sales.SalesOrderHeader table in the AdventureWorks demo database. The copy that I have has been inflated from about 31,500 rows to 1,290,065 rows. Similarly, Sales.SalesOrderDetail, which is often joined with it, has been increased from 121,317 rows to 4,973,997 rows. This will magnify the problems associated with poorly chosen clustered indexes.

For our first example, we’ll illustrate the effects of a non-unique clustered index on the SalesOrderHeader table. Consider the following query:

FROM Sales.SalesOrderHeader
INNER JOIN Sales.SalesOrderDetail ON SalesOrderHeader.SalesOrderID = SalesOrderDetail.SalesOrderID
WHERE OrderDate = '12/10/2015';

This pulls all header and detail data for any orders placed on 12/210/2015. Since there is no index on OrderDate, the resulting execution plan will require a clustered index scan in order to find the data it needs and return it. The current clustered index on the table is on SalesOrderID, a unique identity column. We will run the above query and then change the clustered index to a less-optimal column, such as DueDate:

ALTER TABLE Sales.SalesOrderHeader

DROP CONSTRAINT PK_SalesOrderHeader_SalesOrderID;

CREATE CLUSTERED INDEX CI_SalesOrderHeader ON Sales.SalesOrderHeader (DueDate ASC);

Once this change is made, we’ll run the query again and compare the before vs. after execution plan and IO statistics incurred by each execution:

The execution plans are similar – with the greatest cost being the scan against SalesOrderHeader, which is required to satisfy our query, regardless of either of our clustered index choices. The statistics IO looks like this:

Table ‘SalesOrderHeader’. Scan count 5, logical reads 30010, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table ‘SalesOrderDetail’. Scan count 977, logical reads 3243, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table ‘SalesOrderHeader’. Scan count 5, logical reads 31631, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table ‘SalesOrderDetail’. Scan count 977, logical reads 3243, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Note that when we ran this query against the table with DueDate as the clustered index, an additional 1,621 reads were needed. Storing non-unique clustered indexes, as well as non-clustered indexes on these tables is less efficient, and the cost we pay is in IO, and subsequently in performance. On modern cloud-hosted systems, we directly pay for resource costs based on usage, so the more efficiently we can design our tables, the more money we can save and the better our queries will perform.

Let’s reconsider the above scenario, but this time with a non-clustered covering index on SalesOrderHeader.OrderDate, including a handful of other columns. This is a more realistic query scenario simulating an important, frequently run query that warrants its own index. This will allow us to judge the effects of a poorly chosen clustered index with reference to index seeks against the table. Here is the new index we will add:

CREATE NONCLUSTERED INDEX IX_SalesOrderHeader_OrderDate ON Sales.SalesOrderHeader (OrderDate) INCLUDE (

Now we’ll run our query from earlier:

SELECT SalesOrderHeader.OrderDate
FROM Sales.SalesOrderHeader
INNER JOIN Sales.SalesOrderDetail ON SalesOrderHeader.SalesOrderID = SalesOrderDetail.SalesOrderID
WHERE OrderDate = '12/10/2015';

When executed, IO statistics and the execution plan look like this:

Table ‘SalesOrderDetail’. Scan count 977, logical reads 3165, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table ‘SalesOrderHeader’. Scan count 1, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Now, let’s adjust the clustered indexes on each table to be other columns that could, conceivably, be used instead:

ALTER TABLE Sales.SalesOrderHeader

DROP CONSTRAINT PK_SalesOrderHeader_SalesOrderID;

CREATE CLUSTERED INDEX CI_SalesOrderHeader ON Sales.SalesOrderHeader (RevisionNumber ASC);

ALTER TABLE Sales.SalesOrderDetail

DROP CONSTRAINT PK_SalesOrderDetail_XL_SalesOrderID_SalesOrderDetailID;

CREATE CLUSTERED INDEX CI_SalesOrderDetail ON Sales.SalesOrderDetail (CarrierTrackingNumber ASC);

CREATE NONCLUSTERED INDEX IX_SalesOrderDetail_SalesOrderId ON dbo.SalesOrderDetail (SalesOrderId);

The additional index on SalesOrderId ensures that the INNER JOIN between tables is not trashed by our changes. We run our test query again and the results are as follows:

Table ‘SalesOrderDetail’. Scan count 977, logical reads 12954, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table ‘SalesOrderHeader’. Scan count 1, logical reads 9, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

We can clearly see that, even though the execution plans and the basic operators are very similar, the reads are significantly higher against the table with a poorly chosen clustered index.

We can perform other demonstrations of INSERT, UPDATE, DELETE, as well as maintenance operations and show that all of them suffer as a result of our bad decision-making. I think we get the idea, though, and can now move onto solving problems!

Detecting Duplicate Data Using ApexSQL Monitor

Mocking up scenarios in which poorly chosen clustered indexes have been implemented is not too difficult, but our true goal here is to have a tool that can help us find these problems for us, before they become serious. We want to be able to detect design mistakes quickly, before performance problems impede an application’s ability to function.

A suboptimal clustered index will often manifest itself as a set of symptoms, rather than a distinct and obvious alert. Some possible symptoms may be:

  • A higher-than-expected number of clustered index scans.
  • Operations taking far more IO than expected.
  • Unexpected execution plans.
  • Queries taking longer than expected (due to higher IO/suboptimal plans).
  • Clustered indexes that become fragmented very quickly.

ApexSQL Monitor allows for the creation of custom reports and alerts that can be used in order to measure metrics that can tell us about the issues above, and then let us know when a desired threshold has been exceeded. Once we know that an issue has been identified, we can use waits, query, and server metrics to justify any decisions we make based on the alert. As with all custom alerts that we create, there will be a need for tweaking to ensure that we are only alerted when an action truly is needed. Alert too often and we ignore it, while alerting too infrequently means that we respond late to legitimate problems.

The custom metrics we create can encompass ANY query, so long as that query returns a single numeric value. As with any opportunity to enter TSQL without bounds, I will choose to stretch that opportunity as far as humanly possible 🙂

The first query we can run as a custom alert validates the uniqueness of a clustered index. It’s a very simple query that returns a 1 if any nonunique clustered indexes exist for which there are non-unique values:


SELECT indexes.NAME AS Index_Name
  ,schemas.NAME AS [Schema_Name]
  ,tables.NAME AS Table_Name
  ,columns.NAME AS Column_Name
  ,CAST(0 AS BIT) AS Has_Non_Unique_Clustered_Index_Values
  ,CAST(0 AS BIT) AS Is_Processed
INTO #Non_Unique_Clustered_Indexes
FROM sys.indexes
INNER JOIN sys.tables ON indexes.object_id = tables.object_id
INNER JOIN sys.schemas ON schemas.schema_id = tables.schema_id
INNER JOIN sys.index_columns ON index_columns.index_id = indexes.index_id
  AND index_columns.object_id = tables.object_id
INNER JOIN sys.columns ON columns.object_id = index_columns.object_id
  AND columns.column_id = index_columns.column_id
WHERE indexes.type_desc = 'CLUSTERED'
  AND indexes.is_unique = 0

DECLARE @Results TABLE (Value_Count BIGINT);
DECLARE @Has_Non_Unique_Clustered_Index_Values BIT;

SELECT TOP 1 @Clustered_Index = Index_Name
  ,@Schema_Name = [Schema_Name]
  ,@Table_Name = [Table_Name]
FROM #Non_Unique_Clustered_Indexes;

    SELECT *
    FROM #Non_Unique_Clustered_Indexes
    WHERE Is_Processed = 0

  SELECT @Sql_Command = @Sql_Command + '

  SELECT @Sql_Command = @Sql_Command + '
FROM [' + @Schema_Name + '].[' + @Table_Name + ']

  SELECT @Sql_Command = @Sql_Command + '
      [' + Column_Name + '],'
  FROM #Non_Unique_Clustered_Indexes
  WHERE Index_Name = @Clustered_Index;

  SELECT @Sql_Command = LEFT(@Sql_Command, LEN(@Sql_Command) - 1);

  SELECT @Sql_Command = @Sql_Command + '
HAVING COUNT(*) > 1;';

  INSERT INTO @Results (Value_Count)
  EXEC sp_executesql @Sql_Command;

      SELECT *
      FROM @Results
      WHERE Value_Count IS NOT NULL
        AND Value_Count > 0
    SELECT @Has_Non_Unique_Clustered_Index_Values = 1;
    SELECT @Has_Non_Unique_Clustered_Index_Values = 0;

  UPDATE #Non_Unique_Clustered_Indexes
  SET Has_Non_Unique_Clustered_Index_Values = @Has_Non_Unique_Clustered_Index_Values
    ,Is_Processed = 1
  WHERE Index_Name = @Clustered_Index;

  FROM @Results;

  SELECT TOP 1 @Clustered_Index = Index_Name
    ,@Schema_Name = [Schema_Name]
    ,@Table_Name = [Table_Name]
  FROM #Non_Unique_Clustered_Indexes
  WHERE Is_Processed = 0;

    SELECT *
    FROM #Non_Unique_Clustered_Indexes
    WHERE Has_Non_Unique_Clustered_Index_Values = 1

DROP TABLE #Non_Unique_Clustered_Indexes;

The TSQL above will check system views for any clustered indexes that happen to not be unique and will then select and group data from each of their corresponding tables to determine if there are any duplicate clustered index values. If you have very large tables in your database, then you could simply check for non-unique clustered indexes and not take the additional step to grind through each table searching for dupes. This allows us to determine if a clustered index is unique (which is good!) or not.

The result boils down to either a 0 or a 1:

Not terribly exciting. We could adjust the query to return details, but our goal is an infrequent check to validate that a clustered index isn’t duplicate. It will usually return 0 and we’ll know everything is OK. In ApexSQL Monitor, we can create a custom metric as follows:

From Configuration, choose Custom Metrics, and we can begin filling out the details for our query:

“%Database%” signifies that this is a database-specific query, though is not necessary if the “Database Performance” category is selected. I include it so that if it is needed, it’s there, and if not, there’s no harm done. Otherwise, the rest of the TSQL is the same as the query we introduced previously. We will use this same basic method of adding custom metrics later in this article. 86,400 seconds is a full day. It could run less frequently (or more) if needed. Schema typically changes infrequently, but it’s up to you as to how often to check. Once created, we could alert on this, or just check it every so often to see if it’s the acceptable value (0).

In the example above, we still have the index changes from earlier, in which SalesOrderHeader and SaleOrderDetail were given new clustered indexes. These were not unique, contain NULLs, not ever-increasing, and not static, making them lousy candidates for a clustered index.

Monitoring & Alerting for index Fragmentation using ApexSQL Monitor

Knowing when indexes become fragmented, either by manually checking or alerting can be a useful way to determine the best method of index maintenance. Our goal here is to validate the fragmentation of clustered indexes and use that information to make smart decisions about them (if needed).

ApexSQL Monitor can track index fragmentation, set alert thresholds, and display metrics on indexes based on these settings. To configure this, go into the Configuration menu and scroll down to Index Metrics:

A variety of options are available here. We’ll check off Average Fragmentation, and fill in some alerting thresholds. The numbers are arbitrary, and should be based on your database environment, fragmentation levels, and maintenance needs. Once complete, we can return to the main dashboard:

Click on the Indexes details link in the lower-right, and you’ll be presented with a long list of all indexes that are currently being monitored:

While only showing the first six indexes here, the list will stretch to the bottom of the screen, and in my case span 16 pages! To help with organization, we can drag & drop column headers into the sort area at the top:

By sorting by type, all clustered indexes were conveniently placed at the top of the list. We can also sort by uniqueness and primary key status:

By doing this, we have put any indexes that are clustered, but NOT primary keys or unique at the top of our list. While a clustered index is not required to be unique or a primary key, these are often attributes that are closely related. Since uniqueness is an important attribute of clustered indexes, investigating the two at the top of the list is worthwhile. Similarly, the six that are unique clustered indexes, but not primary keys are also worth some brief investigation. While potentially valid, it’s possible that these tables lack a primary key and/or would benefit from the primary key also being the clustered index.

Since the list of indexes is long, we can filter by size if we wish. Under Configuration, Indexes, we can adjust monitoring to only include indexes larger than a certain size. My list includes everything, but you may exclude smaller indexes to remove noise from your data set:

Clicking on the name of any index brings us to a screen with more details:

While this index is suspicious, fragmentation is low (at least), as we can see from the details in the top-right corner. Alternatively, this clustered index is quite fragmented, and will likely be sending me an alert shortly to let me know of that fact:

At this point, the only task remaining is customization. We can drill in further and restrict monitoring/alerting to only specific database, tables, or types of indexes. Our goal here is to identify unusual behavior or schema that may not be optimal, which is not a task we need to conduct on a regular basis.

Using ApexSQL Monitor to Find Wide Clustered Indexes

Another criteria of a good clustered index is that it is narrow. That is, the size of the column(s) in it do not take up excessive storage space. Since the clustered index columns become part of the logical storage structure of the table, as well as of each non-clustered index, wider columns will result in many queries on the table (reads and writes) consuming excess resources and taking longer than expected.

To accomplish our goal, we need to calculate the width of the columns in each clustered index and decide what is wide or narrow. To provide some perspective, an INT is 4 bytes, a BIGINT 8 bytes, and a DATETIME 8 bytes. A VARCHAR column can store up to its size in bytes, and an NVARCHAR column twice that amount. A UNIQUEIDENTIFIER is 16 bytes.

Typically, I try to keep a clustered index to 8 bytes or less. Ideally, it is as narrow as the table size allows. A small metadata table that contains ten rows of static data could easily have a TINYINT or SMALLINT as the clustered index (assuming no unusual future growth). A table guaranteed to be less than 2^32 could manage 4 bytes (an integer) and 2^64 a BIGINT. This is a task as part of table design, but allows us to gauge if a clustered index key is unusually large.

We can write a query that will pull all clustered index columns and calculate the size of the columns in the index. This will give us a good idea of how wide the index is and allow us to determine if any are poorly chosen due to size:


SELECT indexes.NAME AS Index_Name
    WHEN types.NAME = 'uniqueidentifier'
      THEN 16
    WHEN types.NAME IN (
      THEN 8
    WHEN types.NAME IN (
      THEN 5
    WHEN types.NAME IN (
      THEN 4
    WHEN types.NAME = 'date'
      THEN 3
    WHEN types.NAME = 'smallint'
      THEN 2
    WHEN types.NAME = 'tinyint'
      THEN 1
    WHEN types.NAME IN (
      THEN columns.max_length
    WHEN types.NAME = 'datetime2'
      THEN CASE 
          WHEN types.precision < 3
            THEN 6
          WHEN types.precision BETWEEN 3
              AND 4
            THEN 7
          ELSE 8
    WHEN types.NAME = 'datetimeoffset'
      THEN 10
    WHEN types.is_user_defined = 1
      THEN columns.max_length
    ELSE columns.max_length
    END AS Column_Size
INTO #Clustered_Index_Columns
FROM sys.indexes
INNER JOIN sys.tables ON indexes.object_id = tables.object_id
INNER JOIN sys.schemas ON schemas.schema_id = tables.schema_id
INNER JOIN sys.index_columns ON index_columns.index_id = indexes.index_id
  AND index_columns.object_id = tables.object_id
INNER JOIN sys.columns ON columns.object_id = index_columns.object_id
  AND columns.column_id = index_columns.column_id
INNER JOIN sys.types ON types.user_type_id = columns.user_type_id
WHERE indexes.type_desc = 'CLUSTERED'

SELECT Index_Name
  ,SUM(Column_Size) AS Index_Size
INTO #Clustered_Index_Size
FROM #Clustered_Index_Columns
GROUP BY Index_Name;

FROM #Clustered_Index_Size

DROP TABLE #Clustered_Index_Columns;

DROP TABLE #Clustered_Index_Size;

When we run this, the results show a simple list of clustered indexes and their size, order with the largest first:

The results show a handful of indexes that I would be a little concerned with and would want to investigate further. For example, PK_PersonPhone_BusinessEntityID_PhoneNumber_PhoneNumberTypeID is large because it happens to include PhoneNumber, which is an NVARCHAR(25). That’s 50 bytes alone, and quite wide to be typically included in a clustered index.

Similarly, the clustered index on CI_SalesOrderDetail is CarrierTrackingNumber, which is an NVARCHAR(25). That index is my doing from earlier, and the natural (and correct) clustered index on this table would be on SalesOrderDetailID and SalesOrderID, both integers.

We can add this custom metric into ApexSQL Monitor, similar to earlier, in order to return the largest clustered index width. From this, we could alert on anything larger than a specific value, or to send us a report with the full list. This can be accomplished by replacing the final SELECT above with:

SELECT MAX(Index_Size) AS Index_Size
FROM #Clustered_Index_Size;

The result is a single number that indicates the width of the largest clustered index. From here, we can research as needed, find the culprit, fix it, or create an exception if it’s a special case that we consider important and/or necessary.

The following is the set-up of this metric within ApexSQL Monitor:

If we view the metric, we can see the results and decide if it’s worth investigating further:

Alternatively, we can alert on this as well. It’s a fairly stable metric, but it’s a nice way to have a quick & simple check on index sizes and know when any exist that exceed a threshold of your choosing. If it changes, it’d likely be through a software release, schema change, or a mistake, all things that don’t happen on a minute-by-minute basis.


Choosing a good clustered index is an important step in designing a table. The wrong choice can lead to poor performance, index maintenance problems, and (worst of all) – unhappy users!

A good clustered index is unique, increasing, narrow, static, and used effectively by the applications that rely upon it. With a good understanding of how clustered indexes work, we can write queries, alerts, and metrics that let us know when we have a poorly chosen clustered index. We can extend this knowledge to also monitor non-clustered indexes, allowing us to better understand their size, usage, and when they are not as effective as we would hope.

As with any metrics, customization and a bit of imagination are key to making them work best in your database environment. Adjust the limits, tolerances, and filters on metrics so they only validate the databases, tables, and objects that you actually care about. If you alert on this data, ensure that those alerts happen infrequently enough to be meaningful. Alerts that cry wolf many times per day will eventually be ignored, and therefore be useless.

ApexSQL Monitor and ApexSQL Defrag are two great tools that help organize metrics and alerting on our most important indexes. They can be used as suggested here, or to collect additional index or key metrics as needed. Their ability to take custom TSQL and convert it into a metric and alert on it means that an immense suite of options becomes available to you. Those options are limited only by the TSQL you can dream up!



November 9, 2017