SQL database refactoring techniques – Replacing a natural key with a surrogate key

Replacing a natural key with a surrogate key is structural refactoring method of replacing an existing natural key with a surrogate key.

Why replace a natural key with a surrogate key?

A natural key and a surrogate key are two types of primary key. A natural key is a single column or a combination of columns that has a business value and occurs naturally in the real world (e.g. Social security number, International Standard Book Number…). A surrogate key in SQL Server is created a by assigning an identity property to a column that has a number data type. A surrogate key is a value generated right before the record is inserted into a table.

There are several reasons to replace a natural key with a surrogate key. A surrogate key can be used to reduce coupling since it doesn’t have a business value and it’s not coupled with any external application connected to a database. For example, if the business logic changes at some point, that would require to update a natural key across all foreign key relationships. In some cases, even using Social Security Number or International Standard Book Number as a natural key doesn’t guarantee that a primary key will be unique. A surrogate key value is unique and since it doesn’t have a business value it won’t be updated over time.

Also, using a surrogate key may increase performance because a large natural key can degrade database performance, and due to a fact that surrogate keys are usually integer values a smaller index on a primary key will have better performance on JOIN operations. In case of combined natural primary keys JOIN operations can became very complex:

Introduce a Surrogate Key

In this example, the Account table with a combined primary key will be used:

CREATE TABLE Account (
   AccountNumber int,
   AccountType varchar (6),
   AccountDescription varchar(20),
            PRIMARY KEY (AccountNumber, AccountType));

As it can be seen from the DDL script the table is referenced by a foreign key in the Client table:

CREATE TABLE Client (
   AccountNumber int,
   AccountType varchar (6),
   FirstName varchar (10),
   LastName varchar (10),
            FOREIGN KEY (AccountNumber, AccountType) REFERENCES Account 
(AccountNumber,AccountType));

Writing JOIN statement for tables that include composite primary keys is getting more complex, depending on the number of columns combined in a primary key. Every child table has to include all primary key columns in the JOIN statement:

SELECT
  a.AccountNumber,a.AccountDescription,
  c.LastName, c.FirstName
FROM dbo.Account a
INNER JOIN
dbo.Client c ON c.AccountNumber = a.AccountNumber
AND c.AccountType = a.AccountType

If the composite primary key is replaced on the Account table with one key the JOIN statement will be more understandable:

SELECT
  a.AccountNumber,a.AccountDescription,
  c.LastName, c.FirstName
FROM dbo.Account a
INNER JOIN
dbo.Client c ON c.AccountID = a.AccountID

To apply this refactoring method, the following needs to be done:

  1. Drop the foreign key relationship with the current primary key from the Client table:

    ALTER TABLE dbo.Client
    DROP CONSTRAINT FK__Client__73DA2C14
          GO

  2. Drop the primary key constraint from the Account table:

    ALTER TABLE dbo.Account
       DROP CONSTRAINT PK__Account__BA555B37D93B2923
    GO

  3. Introduce the new key column as IDENTITY and PRIMARY KEY:

    ALTER TABLE Account
    ADD AccountID int IDENTITY(1,1) PRIMARY KEY NOT NULL
    GO

  4. Create the foreign key relationship with the new primary key by adding the AccountID column to the Client table:

    ALTER TABLE Client
    ADD AccountID int FOREIGN KEY REFERENCES Account (AccountID)  
    GO

  5. Since AccountNumber and AccountType columns are no longer needed in the Client table they can be dropped:

    ALTER TABLE Client
    DROP COLUMN AccountNumber
    GO
    ALTER TABLE Client
    DROP COLUMN AccountType
    GO

DLLs for Account and Client tables are now:

CREATE TABLE dbo.Account (
        AccountNumber      int NOT NULL,
        AccountType        varchar(6) NOT NULL,
        AccountDescription varchar(20) NULL,
        AccountID          int IDENTITY(1, 1) PRIMARY KEY NOT NULL
) 
CREATE TABLE dbo.Client(
	AccountID int FOREIGN KEY REFERENCES dbo.Account (AccountID)
	FirstName varchar (10) NULL,
	LastName varchar (10) NULL
)

New relationship diagram:

Add surrogate key using ApexSQL Refactor

To replace a natural key with a surrogate key using ApexSQL Refactor, SSMS and VS SQL database refactoring tool, choose the table in the Object Explorer panel and from the Apex SQL Refactor menu choose the Add surrogate key command, under the Other refactors sub-menu:

Click the Preview button to see the generated script in the Add surrogate key window:

In the Dependencies and the Sequence tabs it can be seen the list of all dependent objects and the order in which changes will be made respectively. In the Warnings tab, it can be seen all referenced objects that require a user’s analysis before running the script.

Useful resources
Refactoring Databases – Evolutionary Database Design
Natural versus Surrogate Keys. Performance and Usability
Surrogate Key vs. Natural Key

 

February 20, 2014