How to automatically compare and synchronize SQL Server data

There are numerous scenarios when data synchronization between the two databases is needed, such as distributing data to different locations, integration of data from different locations into a central repository, or just a simple synchronization between your test and production database.

The recommended solution is to use database replication.

However, even with database replication there’s a need to check whether all the data is synchronized and to synchronize it if needed.

Another situation where data synchronization is needed can be when data gets accidentally deleted from a test database, and you want to recover it by inserting the data from the production database.

Here are several methods you can use to compare and synchronize your databases:

  1. Use SQL with the UNION ALL statement to find the records that don’t match in 2 identical tables:
    SELECT
           MIN(TableName) AS TableName,
           ID,
           Name,
           lastname,
           Address,
           City
      FROM(
        SELECT
               'Table A' AS TableName,
               Customers.id,
               Customers.Name,
               Customers.lastname,
               Customers.Address,
               Customers.City
          FROM Customers
        UNION ALL
        SELECT
               'Table B' AS TableName,
               CustomersOld.id,
               CustomersOld.Name,
               CustomersOld.lastname,
               CustomersOld.Address,
               CustomersOld.City
          FROM CustomersOld)tmp
    GROUP BY
             ID,
             Name,
             lastname,
             Address,
             City
    HAVING COUNT(*) = 1
    ORDER BY
             id;

    For the Customers table:

    And CustomersOld:

    The script returns the following results:

    When the Name for ID=1 in the CustomersOld table is changed to ‘John’, so that the records become equal, SQL returns the following:

    When comparing data in a table, this method provides enough information to start with. However, it doesn’t provide any help with data synchronization – it has to be manual, record by record. Another disadvantage of this method is that you have to modify SQL for every table, according to its structure. It also cannot be used on text, ntext, and image data types. To summarize – if you want to compare several tables and synchronize them – this method requires a lot of time and, due to the manual synchronization it requires, is very prone to errors.

  2. Use Data Compare in Visual Studio

    It compares data in the source database and the target database and creates a synchronization script for the tables you’ve selected for synchronization.

    In addition to the fact that you actually need to have Visual Studio Ultimate or Premium edition, programming knowledge is needed too. Also, this feature cannot compare and synchronize data in the following scenarios:

    • Table column names in source and destination are not identical
    • Case-sensitive table and view names
    • Tables don’t have identical primary key, unique index, or unique constraint
    • Views don’t have identical unique, clustered index
  3. Use SQL Server’s TableDiff

    TableDiff.exe is already in the COM folder of your SQL Server installation. It compares differences in the source and the destination tables and views, generates a script that can be executed to synchronize the data

    Use the following command to compare the Person.Address tables in the source and the destination database and to generate a synchronization script named differencesPersonAddress.sql:

    tablediff.exe -sourceserver Fuji\SQL2012 -sourcedatabase AdventureWorks2012 
    -sourceschema Person -sourcetable Address  -destinationserver Fuji\SQL2012 
    -destinationdatabase CasaBianca -destinationschema Person 
    -destinationtable Address -f e:\table1_differences.sql

     

    In case the structure of the tables is different, e.g. one has the column PostalCode, and the other PostalCd, comparison and synchronization won’t be possible and you’ll get the following message:

    The disadvantages of TableDiff.exe are that there’s no GUI – you can use it only from the Command prompt or in a batch file, there are not many comparison and synchronization options, the tables need to have at least one primary key, identity, or ROWGUID column. One of the challenges you will encounter using tablediff.exe is that you cannot automatically compare all the tables in a database. Do to so, you would have to write additional code, read system tables to get the list of tables in a database and iterate through the list

ApexSQL Data Diff is a SQL Server data comparison and synchronization tool which detects data differences and resolves them without errors. It can compare and synchronize live databases and native or natively compressed database backups and generate comprehensive reports on the detected differences. In case the table or column names you want to compare have different names, you can easily map them. For tables without primary keys and unique indexes, you can set any column to be the key used for comparison.

To set up the process, please follow these steps: 

  1. Start ApexSQL Data Diff

  2. Click the New button in the Project management dialog:

  3. In the Source pane:

    • Select Database from the Type drop-down box
    • From the Server drop-down box, specify the SQL instance with the database containing the correct data
    • Specify the authentication method for that SQL instance (and a valid set of credentials if you chose SQL Server authentication)
    • Specify the source database name in the Database drop-down box

  4. In the Destination pane:

    • Select Database from the Type drop-down box
    • From the Server drop-down box, specify the SQL instance with the destination database
    • Specify the authentication method for that SQL instance (and a valid set of credentials if you chose SQL Server authentication)
    • Specify the destination database name in the Database drop-down box

  5. Click the Compare button in the bottom-right corner of the New project window

  6. After the comparison process is finished, results will be shown in the Main grid:

  7. Select the objects/rows that you want to include in the synchronization

  8. In the Home tab, click the Save button to save the project that will contain information about selected objects and options:

    The saved project file will be used later for the automation of this process.

  9. In the Home tab, click the Synchronize button to start the Synchronization wizard

  10. Under the Output options step, choose the Synchronize now action, in order to perform direct synchronization to the destination database:

  11. In the last step of the Synchronization wizard, click the Synchronize button in the bottom-right corner

  12. Once the synchronization process is over, information about synchronization process will be shown in the Results window:

Automating data sync process

After the initial synchronization was executed successfully, we can move on with automating the process by using the created project file in the PowerShell script.

SQL Server authentication was used in this example, but its login password is encrypted with the saved project file. Find out more about ways of handling login credentials.

In the first part of the PowerShell script, ApexSQL Data Diff location and its parameters will be set:

#ApexSQL Data Diff location and its parameters 
$datadiffLoc   = "ApexSQLDataDiff"
$datadiffParams = "/pr:""MyProject.axdd"" /sync /v /f"

The next part will call ApexSQL Data Diff and process out its parameters:

#initiate the data compare and sync process
(Invoke-Expression ("& `"" + $datadiffLoc +"`" " +$datadiffParams))

As an addition to this PowerShell script, you can utilize “no differences detected” return code, so that on each data comparison it will check if there are any data changes to synchronize and if not, the further process is aborted.

Scheduling data sync process

After the PowerShell script is created and everything is automated as needed, the process can be scheduled in one of the several ways. Find out more about ways of scheduling ApexSQL tools.

There’s no need to manually compare and synchronize data anymore. Use SQL data comparison tool ApexSQL Data Diff to set all synchronization options, map the tables and columns with different names, create your own keys for comparison in the GUI. Schedule it to run unattended and all you have to do is check SQL Server job history in the morning

April 4, 2013