Determining just the right primary key for tables is one of the most important parts of a robust, high-quality database design. The key candidates and the keys themselves need to be picked with caution, as suboptimal choices can snowball out of control and leave the bloated, slow databases which require heavy maintenance and require massive amounts of work to meet changes in the business requirements. Therefore, due to the importance of the primary keys for the future behavior of the database, their impact on the database performance needs to be weighted as well. So, from a performance standpoint, should replacing complex natural keys with a surrogate key be considered?
In SQL Server, primary keys are created as clustered indexes by default. The existence of a clustered index on a table affects all non-clustered indexes on it as well since they are associated with it to take advantage of the sorted data pages. Hence, queries that search for data based on the primary key, which is a clustered index, are very fast. However, the downside to this approach lies in the physical database implementation. Whenever the clustered index key is changed, all indexes must be rebuilt because non-clustered indexes contain the full key of the clustered index. Since natural keys are inevitably tied to the business requirements and the rest of the data in the table, as soon as a change occurs in the requirements or the data, the natural keys themselves will change as well. Therefore, every time the natural key changes – all indexes must be rebuilt, and this is not including changing the actual data type or its size, just the natural key value. On the other hand, as surrogate keys have no business meaning or relationship with the data whatsoever they aren’t affected by requirement changes; hence, no index rebuild is required.
An additional consideration is the size of the key itself. For example, let’s assume that the natural key consists of three columns defined as varchar(25), varchar(10), varchar(25), containing the first, middle and last name of customers respectively. Therefore, in this scenario around 130 rows per page can be stored. However, if this natural key is replaced by a varchar surrogate key, around 4,000 rows per page can be stored. Therefore, any performance impact due to the surrogate key widening the table with an additional column is well compensated by having to read up to 30 times less pages for certain queries.
However, replacing the natural key with a surrogate key after the database has been populated, depending on the environment, might not be as straightforward as adding a new column to serve as the surrogate key and dropping a recreating the primary key constraint on the newly created column. One of the caveats of changing a table’s key in a database which is fully deployed is having to update all the table’s dependencies accordingly. Searching for dependencies manually, for instance using the SQL Server Management Studio’s View Dependencies feature isn’t just labor-intensive; it can be dangerous because SQL Server is unable to recognize all of the relationships between database objects due to its own design and missing to update a dependency might cause real trouble down the road, especially if the dependency-related errors surface sometime after the table has been refactored. However, this is where ApexSQL Refactor can help.
ApexSQL Refactor is a SQL Server Management Studio and Visual Studio add-in which formats and refactors SQL code using 11 code refactors and over 160 formatting options. It expands wildcards, fully qualifies object names, renames SQL database objects and parameters without breaking dependencies and much more.
To safely split a table using our SQL formatter:
- Select the table in either SQL Server Management Studio or Visual Studio
- In the ApexSQL menu, click ApexSQL Refactor
From the Other refactors sub-menu, select the Add surrogate key command:
- Provide the name of the column which will be created on the table as a surrogate key
- Click the Preview button to preview the change SQL script which will be executed
Select the Dependencies tab to preview the dependent objects which will be updated automatically to reflect the change
- Click the Create script button to open script in the Query Editor
- Execute the opened script to refactor the table
At the end, replacing complex natural keys with a surrogate key can substantially increase the performance of the database, and ApexSQL Refactor can make that process very quick and completely painless.
This article is part of a series
April 4, 2013