How to build a “smart” SQL Server Data dictionary

In the article, “What is a data dictionary and why would I want to build one?” a data dictionary was described and compared to other alternatives to documenting, auditing and versioning a database. The article also described the difference between a Dumb and a Smart data dictionary.

Now that we’ve whetted your appetite, this article will explain how to create a smart data dictionary using XML schema change exports from ApexSQL Diff.

The created Data dictionary can be later used for various things like querying to see the full history of object changes or to create aggregate exports showing database change statistics that will be covered in a separate article.

Requirements

  • SQL Server 2005 and above
  • ApexSQL Diff Professional edition (as this task requires the CLI). ApexSQL Diff is a 3rd party tool for comparing SQL Server schemas. Version 2016.02.0383 or higher is required
  • Integrated security for database connectivity. For SQL Server authentication the script can be modified to use encrypted credentials stored in the project file itself. Learn more about different options for handling database credentials here
  • A configuration file (config.xml), that provides the name of the SQL Server and the Database where the Data dictionary is located and also provides SQL Server and the Database name which changes we are tracking. The config file must be located in the same directory as the PowerShell script

Configuration

Setting up the baseline

The first step in the process of creating a Data dictionary is setting up the data sources. To create a “Smart” data dictionary, our dictionary much be “change aware” so that it only writes changes (new, updated and deleted objects). To do this we’ll need to establish (and periodically reset) the baseline and compare it to our actual database

To accomplish this, we’ll use ApexSQL schema snapshots, a proprietary and lightweight file that contains an entire database schema. Snapshots can be created directly from ApexSQL Diff and can be used by ApexSQL Diff to compare to a database to produce a difference export, which we’ll be using later in the article

We’ll create an initial snapshot and save it to the Root path (as specified in the Config file) using ApexSQL Diff. This will all be done by the PowerShell script, which will manage the baseline snapshots for us

This snapshot will serve as our initial baseline. Each time changes are discovered between the baseline snapshot and the database we are working on, we’ll replace our baseline snapshot with a newer version, so it can be used as a baseline on the next comparison to check for any changes

  • Day 1: Baseline snapshot = Database (no changes)
  • Day 2: Baseline snapshot = Database (no changes)
  • Day 3: Baseline snapshot < Database (changes)

    • Write a version of the database (changes only to the data dictionary)
    • Create a new baseline using the newer version of the database

  • Day 4: Baseline snapshot version 2 = Newer version of database (no changes)
  • etc

Creating the data dictionary repository

The next part is the creation of the repository, which will exist as a database table in SQL Server itself. This repository filled with the data from the XML export, will become our data dictionary

So, let’s create a database which will be used as a repository. We’ll decided to call it “DataDictionary”. Then, we’ll create the table, of the same name, for storing the data in the Data dictionary information. (The script for this table can be found in Appendix A)

The next step is creating a stored procedure which will be used to fill the data dictionary. This procedure will extract the data from the XML schema difference export in XML, create a temporary XML table with the data from the XML export, then that data will be extracted from the XML table to the DataDictionary table we created earlier.

The stored procedure FillDataDictionary is created where it’s defined that a temporary XML table will to be created, with all of the data from the XML export. Then all of the data from the temporary XML table is parsed and placed in the DataDictionary table. Once that is done, the temporary table is dropped. (The script for this procedure can be found in Appendix B)

Configuration settings

Now that our data dictionary infrastructure has been created, we’ll need to set the configuration settings to determine

  • The SQL Server and database name who’s changes we’re going to track
  • The SQL Server and database name of the Data Dictionary repository

We’ll create this as an XML file called config.xml and put it in the same directory as the PowerShell script. The PowerShell script will open that config file and parse it, to gather the necessary configuration information to successful run the Data dictionary upload job, each time it runs

Execution

Once the database repository, the DataDictionary table and FillDataDictionary is created, you have edited the configuration file as needed, you are ready to go. The PowerShell script we created, can simply be executed to create the first batch of records in our data dictionary.

The script can be scheduled to unattended, every night at 12:00 AM for example, to ensure your data dictionary is continuously updated

  1. Download the files required for creating the data dictionary
  2. Edit the config.xml file to (See example in Appendix C – Configuration file)

    1. Specify the name and the Server name of the Database that you will create a data dictionary for
    2. Specify the name and the Server name of the DataDictionary repository
  3. Run the enclosed SQL script [create_data_dictionary_schema.sql] to create the repository database, table and load stored procedure in the SQL Server you specify
  4. Run the PowerShell script to capture initial state of your database

Test

Next, make a change to an object in the database, add a new object etc.

Now run the PowerShell script again. You should see a record for each change when you query your Data dictionary database.

To query your data dictionary, use this SQL

SELECT *
FROM [DataDictionary].[dbo].[DataDictionary]

Scheduling for unattended execution

Once everything is running well, schedule a job to run this job unattended. See this link for information on how to do this: How to automate and schedule CLI execution with SQL Server Job

In the next article in this series, we’ll demonstrate how to leverage your newly created data dictionary for auditing, version control and aggregate reporting on transactional data

Downloads

Please download the script(s) associated with this article on our GitHub repository

Please contact us for any problems or questions with the scripts.

Appendix A – the repository creation script

-- Create database [DataDictionary]
IF (NOT EXISTS(SELECT * FROM sys.databases WHERE name='DataDictionary'))
BEGIN
	CREATE DATABASE [DataDictionary]
END
GO

USE [DataDictionary]
GO

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

-- Create table [dbo].[DataDictionary]
IF NOT EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID (N'[dbo].[DataDictionary]'))
BEGIN
	CREATE TABLE DataDictionary
	(
		ID int primary key identity(1,1),
		ChangeDate [datetime],
		[Server] [varchar](128),
		[Database] [varchar](128), 
		ObjectType [varchar](50) ,
		Name [varchar](128) ,
		Owner [varchar](128) ,
		ObjectID [int],
		DiffCode [bit] , 
		DiffType [char](2) ,
		DiffANSI [varchar](5) ,
		DiffAssembly [varchar](5) ,
		DiffAssemblyClass [varchar](5) ,
		DiffAssemblyMethod [varchar](5) ,
		DiffBaseType [varchar](5) ,
		DiffBody [varchar](5) ,
		DiffBoundDefault [varchar](5) ,
		DiffBoundDefaults [varchar](5) ,
		DiffBoundRule [varchar](5) ,
		DiffBoundRules [varchar](5) ,
		DiffChangeTracking [varchar](5) ,
		DiffCheckConstraints [varchar](5) ,
		DiffCLRName [varchar](5) ,
		DiffColumnOrder [varchar](5) ,
		DiffColumns [varchar](5) ,
		DiffDataspace [varchar](5) ,
		DiffDefaultConstraints [varchar](5) ,
		DiffDefaultSchema [varchar](5) ,
		DiffDurability [varchar](5) ,
		DiffExtendedProperties [varchar](5) ,
		DiffFiles [varchar](5) ,
		DiffForeignKeys [varchar](5) ,
		DiffFulltextIndex [varchar](5) ,
		DiffIdentities [varchar](5) ,
		DiffIndexes [varchar](5) ,
		DiffLockEscalation [varchar](5) ,
		DiffManifestFile [varchar](5) ,
		DiffMemoryOptimized [varchar](5) ,
		DiffNullable [varchar](5) ,
		DiffOwner [varchar](5) ,
		DiffParameters [varchar](5) ,
		DiffPermissions [varchar](5) ,
		DiffPermissionSet [varchar](5) ,
		DiffPrimaryKey [varchar](5) ,
		DiffReturnType [varchar](5) ,
		DiffScale [varchar](5) ,
		DiffSize [varchar](5) ,
		DiffStatistics [varchar](5) ,
		DiffUnique [varchar](5) ,
		DiffUserLogin [varchar](5) ,
		DiffXMLColumnSet [varchar](5) ,
		DiffXMLIndexes [varchar](5) ,
		DDL [nvarchar] (max)
	)
END
GO

Appendix B – the data-loader stored procedure script

-- Create stored procedure [dbo].[FillDataDictionary]
CREATE PROCEDURE [dbo].[FillDataDictionary] @xmlLocation VARCHAR(150)
AS
BEGIN
	DECLARE @COMMAND NVARCHAR(MAX)

	SET @COMMAND = N'SELECT CONVERT(XML, BulkColumn) AS XMLData
	INTO ##XMLwithOpenXML
	FROM OPENROWSET(BULK ''' + @xmlLocation + ''', SINGLE_BLOB) AS x';

	EXEC sp_executesql @COMMAND

	DECLARE @XML AS XML
		,@hDoc AS INT
		,@SQL NVARCHAR(MAX)

	SELECT @XML = XMLData
	FROM ##XMLwithOpenXML

	EXEC sp_xml_preparedocument @hDoc OUTPUT
		,@XML

	DROP TABLE ##XMLwithOpenXML

	INSERT INTO DataDictionary
	SELECT GETDATE() AS ChangeDate
		,[Server]
		,[Database]
		,[ObjectType]
		,[Name]
		,[Owner]
		,[ObjectID]
		,[DiffCode]
		,[DiffType]
		,[DiffANSI]
		,[DiffAssembly]
		,[DiffAssemblyClass]
		,[DiffAssemblyMethod]
		,[DiffBaseType]
		,[DiffBody]
		,[DiffBoundDefault]
		,[DiffBoundDefaults]
		,[DiffBoundRule]
		,[DiffBoundRules]
		,[DiffChangeTracking]
		,[DiffCheckConstraints]
		,[DiffCLRName]
		,[DiffColumnOrder]
		,[DiffColumns]
		,[DiffDataspace]
		,[DiffDefaultConstraints]
		,[DiffDefaultSchema]
		,[DiffDurability]
		,[DiffExtendedProperties]
		,[DiffFiles]
		,[DiffForeignKeys]
		,[DiffFulltextIndex]
		,[DiffIdentities]
		,[DiffIndexes]
		,[DiffLockEscalation]
		,[DiffManifestFile]
		,[DiffMemoryOptimized]
		,[DiffNullable]
		,[DiffOwner]
		,[DiffParameters]
		,[DiffPermissions]
		,[DiffPermissionSet]
		,[DiffPrimaryKey]
		,[DiffReturnType]
		,[DiffScale]
		,[DiffSize]
		,[DiffStatistics]
		,[DiffUnique]
		,[DiffUserLogin]
		,[DiffXMLColumnSet]
		,[DiffXMLIndexes]
		,[DDL]
	FROM OPENXML(@hDoc, 'root/*/*')
WITH(
	ObjectType [varchar](50) '@mp:localname'
	,[Server] [varchar](50) '../../Server1'
	,[Database] [varchar](50) '../../Database1'
	,NAME [varchar](50) 'Name'
	,OWNER [varchar](50) 'Owner1'
	,ObjectID [int] 'ObjectID1'
	,DiffCode [bit] 'Diff_Code'
	,DiffType [char](2) 'DiffType'
	,DiffANSI [varchar](5) 'DiffANSI'
	,DiffAssembly [varchar](5) 'DiffAssembly'
	,DiffAssemblyClass [varchar](5) 'DiffAssemblyclass'
	,DiffAssemblyMethod [varchar](5) 'DiffAssemblymethod'
	,DiffBaseType [varchar](5) 'DiffBasetype'
	,DiffBody [varchar](5) 'DiffBody'
	,DiffBoundDefault [varchar](5) 'DiffBounddefault'
	,DiffBoundDefaults [varchar](5) 'DiffBounddefaults'
	,DiffBoundRule [varchar](5) 'DiffBoundrule'
	,DiffBoundRules [varchar](5) 'DiffBoundrules'
	,DiffChangeTracking [varchar](5) 'DiffChangetracking'
	,DiffCheckConstraints [varchar](5) 'DiffCheckconstraints'
	,DiffCLRName [varchar](5) 'DiffCLRname'
	,DiffColumnOrder [varchar](5) 'DiffColumnorder'
	,DiffColumns [varchar](5) 'DiffColumns'
	,DiffDataspace [varchar](5) 'DiffDataspace'
	,DiffDefaultConstraints [varchar](5) 'DiffDefaultconstraints'
	,DiffDefaultSchema [varchar](5) 'DiffDefaultschema'
	,DiffDurability [varchar](5) 'DiffDurability'
	,DiffExtendedProperties [varchar](5) 'DiffExtendedproperties'
	,DiffFiles [varchar](5) 'DiffFiles'
	,DiffForeignKeys [varchar](5) 'DiffForeignkeys'
	,DiffFulltextIndex [varchar](5) 'DiffFulltextindex'
	,DiffIdentities [varchar](5) 'DiffIdentities'
	,DiffIndexes [varchar](5) 'DiffIndexes'
	,DiffLockEscalation [varchar](5) 'DiffLockescalation'
	,DiffManifestFile [varchar](5) 'DiffManifestfile'
	,DiffMemoryOptimized [varchar](5) 'DiffMemoryoptimized'
	,DiffNullable [varchar](5) 'DiffNullable'
	,DiffOwner [varchar](5) 'DiffOwner'
	,DiffParameters [varchar](5) 'DiffParameters'
	,DiffPermissions [varchar](5) 'DiffPermissions'
	,DiffPermissionSet [varchar](5) 'DiffPermissionset'
	,DiffPrimaryKey [varchar](5) 'DiffPrimarykey'
	,DiffReturnType [varchar](5) 'DiffReturntype'
	,DiffScale [varchar](5) 'DiffScale'
	,DiffSize [varchar](5) 'DiffSize'
	,DiffStatistics [varchar](5) 'DiffStatistics'
	,DiffUnique [varchar](5) 'DiffUnique'
	,DiffUserLogin [varchar](5) 'DiffUserlogin'
	,DiffXMLColumnSet [varchar](5) 'DiffXMLcolumnset'
	,DiffXMLIndexes [varchar](5) 'DiffXMLindexes'
	,DDL [nvarchar](max) 'SourceDDL'
	) EXEC sp_xml_removedocument @hDoc END

Appendix C – Configuration file

<config>
        <! —- Server name where the target database is placed -->
        <Server>(local)</Server>
        <! —-Target database name which changes will be tracked --> 
        <Database>AdventureWorks2014</Database>
	<! —-Server name where the data dictionary repository will be placed -->
	<DataDictionaryServer>(local)</DataDictionaryServer>
        <!—-Name of the data dictionary repository (database) -->
	<DataDictionaryDatabaseName>DataDictionary</DataDictionaryDatabaseName>
</config>

Appendix D – the PowerShell script

#find the Snapshot file which has the highest value for the "created date" parameter
function FindSnapshotByDate($folder)
{
    #find all files whose name ends with .axsnp
    $Files = Get-ChildItem -Path $folder -Filter "*.axsnp"
    if ($Files.Length -eq 0)
    {
        #if no such file is found, then that means that there isn't any snapshot previously created
        return $null
    }
    
    $mostRecentFile = $Files | Sort-Object -Property "CreationTime" -Descending | Select-Object -First 1
    return $mostRecentFile.FullName
}

#check the existance of Exports, Logs or Snapshot folders, creates it if it is not created and returns the path
function CheckAndCreateFolder($rootFolder, [switch]$Exports, [switch]$Baselines, [switch]$Logs)
{
    $location = $rootFolder

    #set the location based on the used switch
    if($Exports -eq $true)
    {
        $location += "\Exports"
    }
    if($Baselines -eq $true)
    {
        $location += "\Snapshots"
    }
    if($Logs -eq $true)
    {
        $location += "\Logs"
    }
    
    #create the folder if it doesn't exist and return its path
    if(-not (Test-Path $location))
    { mkdir $location -Force:$true -Confirm:$false | Out-Null }

    return $location
}

#insert schema difference records into the datadictionary table
function InsertRecordsToDataDictionaryDatabase($dataDictionaryServer, $dataDictionaryDbName, $xmlExportFullPath)
{
    $SqlConnection = New-Object System.Data.SqlClient.SqlConnection
   
    $SqlConnection.ConnectionString = "Server=$dataDictionaryServer;Initial catalog=$dataDictionaryDbName;Trusted_Connection=True;"

    try
    {
       $SqlConnection.Open()
       $SqlCommand = $SqlConnection.CreateCommand()
       $SqlCommand.CommandText = "EXEC [dbo].[FillDataDictionary] '$xmlExportFullPath'"

       $SqlCommand.ExecuteNonQuery() | out-null
    }
    catch
    {
        Write-Host "FillDataDictionary could not be executed`r`nException: $_"
    }
}

#################################################################################                                        

#read and parse config.xml file from root folder
[xml]$configFile = Get-Content config.xml

$server               = $configFile.config.Server
$database             = $configFile.config.Database
$dataDictionaryServer = $configFile.config.DataDictionaryServer
$dataDictionaryDbName = $configFile.config.DataDictionaryDatabaseName
$rootFolder           = $PSScriptRoot

#check if ApexSQLDiff is installed
$DiffInstallPath = Get-ItemPropertyValue -Path "HKLM:\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall\ApexSQL Diff*" -Name "InstallLocation"
if(-not $DiffInstallPath)
{
    #ApexSQL Diff installation not found. Please install ApexSQL Diff to continue
}

$diffLocation      = $DiffInstallPath + "ApexSQLDiff.com"
  
$snapshotsLocation = CheckAndCreateFolder $rootFolder -Baselines
$exportLocation    = CheckAndCreateFolder $rootFolder -Exports
$logLocation       = CheckAndCreateFolder $rootFolder -Logs

$today          = (Get-Date -Format "MMddyyyy")
$latestSnapshot = FindSnapshotByDate $snapshotsLocation

$snapshotName = "SchemaSnapshot_$today.axsnp"
$logName      = "SnapshotLog_$today.txt"
$xml          = "SchemaDifferenceExport_$today.xml"

$initialCompare          = "/s1:""$server"" /d1:""$database"" /s2:""$server"" /d2:""$database"" /ot:x /xeo:e is /on:""$exportLocation\$xml"" /f /v"
$compareSettingsSnapshot = "/s1:""$server"" /d1:""$database"" /sn2:""$latestSnapshot"" /out:""$logLocation\$logName"" /rece /f /v"
$exportSettingsSnapshot  = "/s1:""$server"" /d1:""$database"" /sn2:""$snapshotsLocation\$snapshotName"" /export /f /v"
$diffExportXMLparams     = "/s1:""$server"" /d1:""$database"" /sn2:""$latestSnapshot"" /ot:x /xeo:d s t is /on:""$exportLocation\$xml"" /f /v" 

#if no previous snapshot found, create snapshot for current state and skip the rest
if($latestSnapshot -eq $null)  
{
     #put initial state of current database in datadictionary
     (Invoke-Expression ("& `"" + $diffLocation +"`" " +$initialCompare))
     PutRecordsToDataDictionaryDatabase $dataDictionaryServer $dataDictionaryDbName $exportLocation\$xml

     #create snapshot of current database state
     (Invoke-Expression ("& `"" + $diffLocation +"`" " +$exportSettingsSnapshot))

     Write-Host "Snapshot is not found in the '$snapshotsLocation' folder.`r`n`r`nInitial snapshot has been automatically created and named '$snapshotName'"
     
     #here, add the comparison against empty datasource
     continue
}

 #compare the database with latest snapshot
 (Invoke-Expression ("& ""$diffLocation"" $compareSettingsSnapshot"))
 $returnCode = $LASTEXITCODE

 #differences detected
 if($returnCode -eq 0)
 {
    #Export differences into XML file
    (Invoke-Expression ("& ""$diffLocation"" $diffExportXMLparams"))
    #Add timestamp on each line of log file
    $tsOutput | ForEach-Object { ((Get-Date -format "MM/dd/yyyy hh:mm:ss") + ":  $_") >> $file }
 
    PutRecordsToDataDictionaryDatabase $dataDictionaryServer $dataDictionaryDbName $exportLocation\$xml 
    
    #create snapshot of current database state
    (Invoke-Expression ("& `"" + $diffLocation +"`" " +$exportSettingsSnapshot))
 }
 #there are no differences or an error occurred
 else
 {
     #an error occurred
     if($returnCode -ne 102)
     {
         Write-Host "An error occurred during the application execution.`r`nError code: $returnCode"
         continue
     }
 }

References:

January 10, 2017