How to analyze and read Change Data Capture (CDC) records

In the previous article, How to enable and use SQL Server Change Data Capture, we described the main features of SQL Server Change Data Capture and showed how to set it up. Now, we will analyze the records stored in change tables and describe the methods to read them

The system tables created by the feature

The following tables are automatically created in the tracked database when Change Data Capture is enabled:

cdc.captured_columns – contains a row for each column tracked in the tracked (source) tables

cdc.change_tables – contains a row for each change table in the tracked database

cdc.ddl_history – contains a row for each structure (Data Definition Language) change of source tables

cdc.index_columns – contains a row for each index column associated with a change table. The index columns are used to uniquely identify rows in the source tables

cdc.lsn_time_mapping – contains a row for each transaction in the source tables. It maps Log Sequence Number values to the time the transaction was committed

The table dbo.cdc_jobs that stores configuration parameters for capture and cleanup jobs is the only system table created in the msdb database

When the feature is enabled on a table, the change table named cdc.<captured_instance>_CT is automatically created in the tracked database. The table contains a row for each insert and delete on the source table, and two rows for each update. The first one is identical to the row before the update, and the second one to the row after the update. To query the table, use the cdc.fn_cdc_get_all_changes and cdc.fn_cdc_get_net_changes functions

cdc.captured_instance_CT is automatically created in the tracked database

The first five columns contain the metadata necessary for the feature, the rest are the exact replica of the source table
__$start_lsn – the Log Sequence Number of the commited transaction. Every change committed in the same transaction has its own row in the change table, but the same __$start_lsn
__$end_lsn – the column is always NULL in SQL Server 2012, future compatibility is not guarantee
__$seqval – the sequence value used to order the row changes within a transaction
__$operation – indicates the change type made on the row

  1. Delete
  2. Insert
  3. Updated row before the change
  4. Updated row after the change

__$update_mask – similar to the update mask available in Change Tracking, a bit mask used to identify the ordinals of the modified columns

The system table valued functions

The same as with the SQL Server Change Tracking feature, the change information in SQL Server Change Data Capture is available through table valued functions. We will describe and show examples for the ones most frequently used [1]

cdc.fn_cdc_get_all_changes_<capture_instance> – returns a row for each change in the source table that belongs to the Log Sequence Number in the range specified by the input parameters

cdc.fn_cdc_get_all_changes_capture_instance(from_lsn, to_lsn, 
'<row_filter_option>')

The <row_filter_option> parameter affects only the UPDATEs. It can have the following values:

  • All – every change is represented with a single row

    Get all changes function: Row filter option parameter - each change is represented with a single row

  • All update old – UPDATEs are represented by 2 rows showing before and update values of the row

    Get all changes function: Row filter option parameter - UPDATEs are represented by 2 rows

cdc.fn_cdc_get_net_changes_<capture_instance> – returns one row that represents multiple changes on a single row aggregated as a single one

cdc.fn_cdc_get_net_changes_capture_instance( from_lsn , to_lsn , 
'<row_filter_option>' )

The rows with a single change are represented the same way as with the cdc.fn_cdc_get_all_changes function. For example, if a column was first updated from ’1970 Napa Street’ to ‘123 Street’ and then to ‘99 Daisy Street’, the cdc.fn_cdc_get_all_changes function returns all 3 transactions

Get all changes function returns all 3 transactions

While the cdc.fn_cdc_get_net_changes function returns only one

Get net changes function returns one transaction

The <row_filter_option> parameter can have the following values:

  • All – returns the LSN of the final change, the __$update_mask column is always NULL

    Get net changes function: Row filter option parameter, value = All

  • All with mask – returns the LSN of the final change, the __$update_mask column shows the IDs of the modified columns

    Get net changes function: Row filter option parameter, value = All with mask

  • All with merge – returns the LSN of the final change. The __$operation value is 1 for a delete, 5 when the net operation is an insert or an update. The __$update_mask column is always NULL

    Get net changes function: Row filter option parameter, value = All with merge

Both cdc.fn_cdc_get_all_changes and cdc.fn_cdc_get_net_changes functions require two parameters – the maximal and minimal Log Sequence Number (LSN) for the queried set of records

To determine the Log Sequence Number that can be used in functions, two functions are available:

sys.fn_cdc_get_min_lsn – “Returns the start_lsn column value for the specified capture instance from the cdc.change_tables system table. This value represents the low endpoint of the validity interval for the capture instance….Requires membership in the sysadmin fixed server role or db_owner fixed database role. For all other users, requires SELECT permission on all captured columns in the source table and, if a gating role for the capture instance was defined, membership in that database role.” [2]

SELECT sys.fn_cdc_get_min_lsn('Person_Address') AS min_lsn

sys.fn_cdc_get_max_lsn – similar to the sys.fn_cdc_get_min_lsn function, returns the maximum Log Sequence Number, the high endpoint of the validity interval for all source tables. No parameters are required

SELECT sys.fn_cdc_get_min_lsn('Person_Address') AS min_lsn
SELECT sys.fn_cdc_get_min_lsn() AS min_lsn

Results of a function that returns min and max Log Sequence Number

sys.fn_cdc_get_column_ordinal – returns the ordinal of the column in a source table. These ordinals are used in change tables to reference a specific column

SELECT sys.fn_cdc_get_min_lsn( 'Person_Address', 'AddressLine1') 

returns 2, as it’s the second column in the Person.Address table

Change Data Capture also provides a range of system stored procedures to configure, maintain, and manage the feature on the database and tables, and to acquire captured information [3]

Reading the records

In the following example, we inserted three rows into the Person.Address table, update one, and deleted one

INSERT INTO [Person].[Address] ([AddressID], [AddressLine1], [AddressLine2], 
[City],
[StateProvinceID]) VALUES (32522, N'1234 Rodeo Drive', NULL, N'New York', 79)

INSERT INTO [Person].[Address] ([AddressID], [AddressLine1], [AddressLine2], 
[City],
[StateProvinceID]) VALUES (32523, N'2345 Red Hills Way', NULL, N'Bellevue', 79)

INSERT INTO [Person].[Address] ([AddressID], [AddressLine1], [AddressLine2], 
[City],
[StateProvinceID]) VALUES (32524, N'3456 Big City Street', NULL, N'Edmonds', 79)

UPDATE [Person].[Address] SET [AddressLine1] = N'5415 La Valetta Blv.'	,
[City] = 
N'Seattle' WHERE [AddressID] = 16

DELETE FROM [Person].[Address] WHERE [AddressID] = 32524

To read the change tables, MSDN doesn’t recommend direct querying of the table, but using the system functions instead

To read all captured information for the Person.Address table, execute:

DECLARE @from_lsn binary (10), @to_lsn binary (10)

SET @from_lsn = sys.fn_cdc_get_min_lsn('Person_Address')
SET @to_lsn = sys.fn_cdc_get_max_lsn()

SELECT *
FROM cdc.fn_cdc_get_all_changes_Person_Address(@from_lsn, @to_lsn, 'all')
ORDER BY __$seqval

Results for all captured information for the Person.Address table

  • the first three rows with __$operation = 2 show the inserted rows
  • the fourth row with __$operation = 3 is the row that was updated before the update
  • the fifth row with __$operation= 4 is the updated row after the update
  • the last row with __$operation = 1 is the deleted row

Here are the same operations tracked by SQL Server Change Tracking

Results for operations tracked by SQL Server Change Tracking

As shown, Change Data Capture shows the exact values of all tracked columns in the modified rows, even if the column itself was not updated

Following code is an example of how to use the fn_cdc_get_column_ordinal and __$update_mask to check whether a column has been changed or not

DECLARE @from_lsn binary (10) ,@to_lsn binary (10)
DECLARE @AddressIDPosition INT
DECLARE @AddressLine1Position INT
DECLARE @AddressLine2Position INT
DECLARE @CityPosition INT
DECLARE @StProvIDPos INT
DECLARE @PostalCode INT

SET @from_lsn = sys.fn_cdc_get_min_lsn('Person_Address')
SET @to_lsn = sys.fn_cdc_get_max_lsn()
SET @AddressIDPosition = sys.fn_cdc_get_column_ordinal('Person_Address', 
'AddressID')
SET @AddressLine1Position = sys.fn_cdc_get_column_ordinal('Person_Address', 
'AddressLine1')
SET @AddressLine2Position = sys.fn_cdc_get_column_ordinal('Person_Address', 
'AddressLine2')
SET @CityPosition = sys.fn_cdc_get_column_ordinal('Person_Address', 'City')
SET @StProvIDPos = sys.fn_cdc_get_column_ordinal('Person_Address', 
'StateProvinceID')
SET @PostalCode = sys.fn_cdc_get_column_ordinal('Person_Address', 'PostalCode')

SELECT fn_cdc_get_all_changes_Person_Address.__$operation
	,fn_cdc_get_all_changes_Person_Address.__$update_mask
	,sys.fn_cdc_is_bit_set(@AddressIDPosition, __$update_mask) as 
'UpdatedAddressID'
	,sys.fn_cdc_is_bit_set(@AddressLine1Position, __$update_mask) as 
'UpdatedLine1'
	,sys.fn_cdc_is_bit_set(@AddressLine2Position, __$update_mask) as 
'UpdatedLine2'
	,sys.fn_cdc_is_bit_set(@CityPosition fn_cdc_get_column_ordinal) as 
'UpdatedCity'
	,sys.fn_cdc_is_bit_set(@StProvIDPos, __$update_mask) as 
'UpdatedState'
	,sys.fn_cdc_is_bit_set(@PostalCode, __$update_mask) as 'Updated Postal'
FROM cdc.fn_cdc_get_all_changes_Person_Address(@from_lsn, @to_lsn, 'all')
ORDER BY __$seqval

The result set shows 1 if the Updated_<column_name> was modified, 0 otherwise

Change Data capture results showing the exact values of all tracked columns in the modified rows

While Change Tracking shows only what was changed and whether the change was an insert, update, or delete, Change Data Capture shows the values inserted, deleted or updated for the modified rows. For updates, it shows both old and new values of the updated row

The feature doesn’t track the user who made the change. To do that, you have to create a new field where the user’s details are stored and updated after each change. The same goes for the time of the change and the machine used to make the change. The execution of the SELECT statements and object access are not tracked

As change capturing is an asynchronous process. First a change is committed to a source table and the change is added to the change table afterwards. There is a delay between these two actions. The captured info has to be obtained using functions. When a table schema changes, the changes to the affected columns will be ignored unless a new change table is associated with the source

The data captured in change tables can grow uncontrollably, if you stop the job that purges the data, or modify it so it doesn’t run often enough. The feature is supported only in Enterprise and Developer editions

If you need detailed information about the data changes on your tables, Change Data Capture is a better solution than Change Tracking, as it provides the values that were inserted, deleted or updated. If the feature doesn’t track all events you would like and provide all the information you are looking for, there are other auditing solutions. In the next article in the series, we’ll analyze the SQL Server Auditing feature vs ApexSQL Audit

All articles in this series:

  1. What is SQL Server Change Tracking and how to set it up?
  2. How to read SQL Server Change Tracking results
  3. SQL Server Change Data Capture (CDC) – Introduction
  4. How to enable and use SQL Server Change Data Capture
  5. How to analyze and read Change Data Capture (CDC) records
  6. SQL Server Audit feature – Introduction
  7. How to set up and use SQL Server Audit
  8. How to analyze and read SQL Server Audit information
  9. Auditing SQL Server data changes – the centralized solution

Downloads

Please download the script(s) associated with this article on our GitHub repository.

Please contact us for any problems or questions with the scripts.

References:

[1] MSDN – Change Data Capture Functions
[2] MSDN – sys.fn_cdc_get_min_lsn (Transact-SQL)
[3] MSDN – Change Data Capture Stored Procedures

Useful resources:

MSDN – Work with Change Data (SQL Server)
MSDN – Change Data Capture Tables (Transact-SQL)
MSDN – cdc.<capture_instance>_CT

November 12, 2013