What is SQL Server database continuous integration (CI)? – a high level conceptual overview and walk thru of the steps in a SQL database CI pipeline

If you’ve never automated builds in SQL Server before and but are still CI-curious, then this quick high-level overview, should introduce to you the basics and maybe even lead to you wading into the water a little deeper

First, continuous integration means that you will rapidly iterate changes, so that each time a change is formally made, usually by a source control commit, then the target system, a database in our case is automatically destroyed and recreated with the change. Furthermore, the entire test environment is recreated with a combination of static and test data, any automated tests and/or reviews are conducted, and the build is fully documented.

Getting our database into source control, and updating our repository with each new change

To begin with we need to get our database into source control. That is easy. Using a variety of means, you can script out your objects and put them into your source control repository.

This is the easy part, you can do this easily using PowerShell and SMO. You can set up a job to do this automatically.

You can also use ApexSQL Script, a tool for scripting database objects and data to automatically script your objects and static data directly into source control. See more below:

The problem is that, by doing this you will include every version of every object, whether it has changed or not, so that 99% of your version change history might not show any changes at all. This means that even though you have gotten your objects into version control, you haven’t done it a way that will allow you to really leverage version control to see version history, compare changes, report etc.

For this reason, using a change management tool to compare the objects in the repository with the latest version of the objects in development, and only commit a new version if it has changed, will keep your repository streamlined and clean, with only changed versions of objects. You’ll see the benefit, when you review change history and can be assured that each version actually has a change from the previous. Something like this can be scripted, turned into a job etc and run on a schedule, like every night. This makes sure that all changes from that day are versioned in source control that night. Even though this is better, it still isn’t as good as integrating changes as soon as they are committed to the repository (see next)

ApexSQL Diff is a SQL database change management tool that can automatically synchronize schema changes from your development database, directly into source control. See below to learn more

ApexSQL Data Diff is a SQL database data change management tool that can automatically synchronize static data between a development database and a source control repository. See below to learn more

ApexSQL Source control is a SQL source control integration tool which allows you to iteratively develop in SSMS or Visual Studio and commit changes to source control. It also allows you to directly commit databases to a source control repository

To achieve true continuous integration, we want to integrate the changes from the latest commit into our test environment as soon as they happen. To do that, we need to front source control integration onto the desktops and IDEs of each SQL developer. That way they can check out, update and check in objects back into the repository. And as soon as they commit a new change, trigger a new pipeline that will integrate the change

ApexSQL Source Control will allow you to integrate SQL development directly with source control via integration into SSMS or Visual Studio

  1. Build step: Building our new database test environment from source control

    The next challenge, once you have gotten your (changed only) objects into version control, is getting them out. Simply reversing the process won’t work because order matters when creating our database, so that if the objects aren’t created in the proper sequence, your build will fail.

    Fortunately, tools exist that can pull objects out of source control, order them in the correct sequence and add them all to our database

    ApexSQL Diff can be used to compare the newly created, but empty, test database to your repository, and synchronize it, by adding all of the objects from the repository

    But we prefer ApexSQL Build, a tool specifically designed to build new databases, including directly from source control.

    Destroying and re-building our environment

    Before we add objects to our new test (aka temporary) database, we have to destroy the previous one first. Once it has been eliminated, we can build another, empty one, from scratch. To do that you generally want to run a CREATE DATABASE script, that should be stored in source control as well.

    Once you have successfully destroyed your old test database and re-created another, the objects from the previous step can be added

    What about static data?

    Many developers, as they should, maintain static aka code table, list data in source control. These are tables like names of cities, that rarely, if every change (thus the name “static”) that usually don’t contain lots of data, at least compared to transactional tables. One the other hand, when static data ever does change, we want to make sure the changes are versioned in source control. Finally, if we store static data in source control, we’ll get realistic data … because it is real. We won’t have to synthetically generate the data, which will make it less realistic. All of this, and more, makes storing static data under source control a “no brainer”

    Better yet, the same tools we use to pull our database objects out of source control, usually can also easily do the same with static data as well.

    ApexSQL Source Control can version control static data. Both ApexSQL Build and ApexSQL Data Diff can populate static tables with data directly from source control. This functionality is described in the articles below

    Pause for intermission

    Taking a pause at the point where we’ve re-created everything from source control, we can see that we’ve

    1. Destroyed our test environment. Hopefully no QAs who were actively testing at the time were hit by falling debris
    2. Rebuilt our test environment with a CREATE DATABASE script pulled from our source control repository
    3. Added in all of the database objects e.g. tables, procedures, in the correct sequence to avoid show-stopping dependency errors which may turn our first CI pipeline into a pipe-bomb …
    4. Optionally, added any static data resident in our repository as well

    So what we have done is fully recreated our test environment, from scratch, automatically.

  2. Populate step: For the rest, we’ll use test – filling our new temporary database with synthetic test data

    Now that we’ve built our new temporary database and populated all static tables with data versioned in source control, we’ll want to populate the rest of our tables. We must do this to create a realistic environment for testing, where we can run unit tests and test client software, scripts and the database itself against realistic data.

    We want to avoid using production data for testing as we don’t want to run afoul of data governance rules like GDPR that expressly forbid re-provisioning data for non-lawful purposes, that the user would never have intended.

    Fortunately, many tools exist to quickly and easily populate our transactional tables with realistic test and synthetic data. If they can be automated, they can be integrated into our continuous integration pipeline

    ApexSQL Generate is a synthetic, test data generation tool for SQL Server that can rapidly create realistic test data to populate our temporary database in our continuous integration pipeline. Learn more below

  3. Test step: Automatic SQL Server unit testing

    Once the database has been created and populated with data, both test and static, automatic tests can be run against the new temporary database. Just like client code, databases can and should be unit tested. The tSQLt framework provides a mechanism to do this and 3rd party tools exist, that work with tSQLt to allow you to automated these tests and integrate them as a critical part of your continuous integration pipeline

    As a bonus, the SQLCop library of tests can be run as part of more database specific unit tests to provide an automated review of the build vs a set of best practices (more on this next)

    ApexSQL Unit Test is a tool that uses the tSQLt framework to create libraries of and execute SQL unit tests. See below for some more information

    SQL Server database continuous integration workflow TEST step – Running SQL unit tests against the changes

  4. Review step: Reviewing changes for conformity to SQL coding best practices

    Even though changes in the pipeline might not have broken the database or client application, or even your automated unit tests, they might not meet your coding, naming conventions or other standards. They may also violate commonly accepted best practices

    Code review tools, integrated into your pipeline, can review all of your changes and even fail the build if enough problems are detected.

    We want a green build, but we also want a clean build. And such reviews can make sure that the production system isn’t degraded with violations to coding standards and other poor practices

    ApexSQL Enforce is a tool to review your SQL Server database against a set of SQL coding best practices and detect any unsavory SQL coding smells. Via a sophisticated CLI, it is easy to integrate this tool into any SQL continuous integration process. See below to learn more

  5. Document step: Documenting your changes

    Changes committed to source control that will participate in each continuous integration pipeline run, should be fully documented in the format of your choice e.g. CHM. HTML, PDF. This allows you to look at any pipeline run and, within a single, user friendly report to see what has changed

    ApexSQL Doc is a tool to document SQL Server databases, and much more e.g. SSIS, SSAS, Tableau, MySQL etc. It is particularly well suited to CI pipelines because it has a console application and rich command line interface. See below to learn more

    ApexSQL Diff can also be used to create schema change reports and exports, as described below

  6. Package step: Wrapping everything up into a NuGet package

    Once your build has been created, reviewed, tested and documented, you are ready to package it up for archiving or distribution, as part of a continuous delivery process. Automatically creating a NuGet package with all of the artifacts from the pipeline including documentation and synchronization scripts and putting it into a NuGet feed, allows you to seamlessly segue to an automated delivery process that consumes NuGet packages/feeds

    Build triggers

    But what triggered this build? New builds are normally triggered in three ways

    1. Manually – by some manual process of clicking a button, hitting run on a script or otherwise to initiate the build.
    2. On a schedule – this would be set up to run automatically on some pre-defined interval or schedule. Commonly, this would be at night, where changes from the day are applied and a new build is ready for QA the following morning
    3. On a new commit – this is the most iterative. With this approach, every time a new commit is made, the build process is triggered, so that the test database always reflects the latest changes. This means that the time from commit to finding problems has been minimized and that issues can, in theory, be caught as soon as the mistakes are made, or shortly thereafter

Summary

I hope you have enjoyed this brief walk-through of conceptually building a SQL Server continuous integration pipeline. Along with this, we’ve demonstrated, at every step, tools that can be deployed to implement and automate each step

But it gets easier. With the ApexSQL CICD toolkit, you can get an out of the box solution to build CI pipelines quickly and easily. The toolkit includes

  1. Open source PowerShell cmdlets to automate every step in a CI (and CD) pipeline including configuration option and data sources
  2. A web dashboard to allow for point and click construction and execution of CI (and CD) pipelines
  3. Plug-ins for your favorite Build servers e.g. TeamCity

For more information on this toolkit see the ApexSQL CICD toolkit product page

July 26, 2018