SQL database refactoring techniques

In this article we’ll make an introduction in a series of articles on SQL database refactoring and solutions using ApexSQL Refactor a SSMS and VS add-in with 11 SQL database refactors

What is SQL database refactoring?

Database refactoring is a structural or a functional change in a database schema in order to improve SQL database design while retaining its semantics without taking away or adding any functionality

When you perform structural refactoring on a database schema you’ll also need to modify any external applications that are coupled to your SQL Server database. The more connections you have more complex will be to perform refactoring and the bigger the chance is that a change in one thing will require a change in another

SQL database refactoring becomes more complex depending on the amount of coupling as a result of the architecture of your database, e.g. single-application database vs. multi-application database

In a single-application scenario you have a complete control over both your database schema and the application that access your database

In a multiple application database you have little or no control over external applications that access your database

When to perform SQL refactoring

There are several reasons to consider SQL database refactoring such as performance issues, improving maintainability, improving security etc

Before applying refactoring to a database you’ll need to do a sanity check – does the refactoring make sense, is the change really needed and is it worth the effort. The concept of “code smell”, introduced by Martin Fowler in 1977 is an indication that you code may need refactoring. Database smells introduced by Scot W. Ambler in 2003, similarly, are common indications that you need to refactor your database

These smells include the following:

Multipurpose column – If a column is being used for several purposes, it is likely that extra code exists to ensure that the source data is being used the “right way”, often by checking the values of one or more other columns

Multipurpose table – Similarly, when a table is being used to store several types of entities, there is likely a design flaw

Redundant data – Redundant data is a serious problem in operational databases because when data is stored in several places, the opportunity for inconsistency occurs

Tables with too many columns-When a table has many columns, it is indicative that the table lacks cohesion – that is trying to store data from several entities

Tables with too many rows– Large tables are indicative of performance problem. For example, it is time consuming to search a table with millions of rows

“Smart” columns– A smart column is one in which different positions within the data represent different concepts. For example, if the first four digits of the client ID indicate the client’s home branch, then client ID is a smart column because you can parse it to discover more granular information (for example, home branch ID) [1]

Choose the most appropriate SQL database refactoring technique

Once you analyzed the problem you’re facing and determined that you need to refactor your database, you’ll need to choose the appropriate database refactoring technique. For example, if the issue you’re having is a table with too many rows you will choose the Split table refactoring method. If you’re ignoring the often repeated rule “never use SELECT *” you’ll need to refactor your code with naming columns to avoid for example, performance issues, duplicate column names returned in the result set, which causes client applications to throw an exception when attempting to read columns by name. To protect your database from SQL injection you’ll need to refactor your stored procedures and encapsulate your code, and so on

Categories of database refactoring

In Refactoring Databases Scot W. Ambler and Pramod J. Sadalage introduced six categories of database refactors:

Structural – A change in the definition of tables, views, and columns. Some of these refactorings include the replace one- to- many relationship with associative table, split table, rename table, and more

Data Quality – This category of refactorings covers changes that improve the quality of the data stored in a database. Refactorings in this category are add lookup table, drop standard type, move data, and more

Referential Integrity – Changes that ensures that any data that is referenced by one table exists, and also that unused data is removed

Architectural – Changes with a goal of improving the overall methodology in which external applications interact with a database

Method – Code changes like adding/removing parameters to a stored procedure, with a goal to improve overall quality

Transformations – Changes in a database schema. For example introducing a new table or a column, and inserting and updating data

The process of database refactoring

After analyzing your database and choosing the appropriate refactoring method you’re starting the refactoring process

The best case scenario is performing refactoring in a single-application database because you can refactor database schema and application source code at the same time. You will implement the refactoring method in you project, test it and deploy to production

In the multiple-application database environments the process is more complex and lasts longer, because of the different times of deployments of the individual applications

The first step is to implement the refactoring to your project. During the transition period both the original and refactored schema will exist

Once the scaffolding code from the original application is removed from production, and your database is tested again refactoring is completed

The complete process of database refactoring is shown in a picture below:

Before automated database refactoring tools were available everything was need to be done thru code. In the following series of articles we’ll introduce ApexSQL Refactor database refactors, method by method, and show you both “by hand” and the automated way to do your database refactoring

References:
[1] Refactoring Databases – Scot W. Ambler, Pramod J. Sadalage

Useful resources:
Refactoring SQL Applications – Stephane Faroult, Pascal L’Hermite
Catalog of Refactorings
Introduction to Database Refactoring

February 14, 2014