In the previous article, How to enable and use SQL Server Change Data Capture, we described the main features of SQL Server Change Data Capture and showed how to set it up. Now, we will analyze the records stored in change tables and describe the methods to read them
The system tables created by the feature
The following tables are automatically created in the tracked database when Change Data Capture is enabled:
cdc.captured_columns – contains a row for each column tracked in the tracked (source) tables
cdc.change_tables – contains a row for each change table in the tracked database
cdc.ddl_history – contains a row for each structure (Data Definition Language) change of source tables
cdc.index_columns – contains a row for each index column associated with a change table. The index columns are used to uniquely identify rows in the source tables
cdc.lsn_time_mapping – contains a row for each transaction in the source tables. It maps Log Sequence Number values to the time the transaction was committed
The table dbo.cdc_jobs that stores configuration parameters for capture and cleanup jobs is the only system table created in the msdb database
When the feature is enabled on a table, the change table named cdc.<captured_instance>_CT is automatically created in the tracked database. The table contains a row for each insert and delete on the source table, and two rows for each update. The first one is identical to the row before the update, and the second one to the row after the update. To query the table, use the cdc.fn_cdc_get_all_changes and cdc.fn_cdc_get_net_changes functions
The first five columns contain the metadata necessary for the feature, the rest are the exact replica of the source table
__$start_lsn – the Log Sequence Number of the commited transaction. Every change committed in the same transaction has its own row in the change table, but the same __$start_lsn
__$end_lsn – the column is always NULL in SQL Server 2012, future compatibility is not guarantee
__$seqval – the sequence value used to order the row changes within a transaction
__$operation – indicates the change type made on the row
- Delete
- Insert
- Updated row before the change
- Updated row after the change
__$update_mask – similar to the update mask available in Change Tracking, a bit mask used to identify the ordinals of the modified columns
The system table valued functions
The same as with the SQL Server Change Tracking feature, the change information in SQL Server Change Data Capture is available through table valued functions. We will describe and show examples for the ones most frequently used [1]
cdc.fn_cdc_get_all_changes_<capture_instance> – returns a row for each change in the source table that belongs to the Log Sequence Number in the range specified by the input parameters
cdc.fn_cdc_get_all_changes_capture_instance(from_lsn, to_lsn, '<row_filter_option>')
The <row_filter_option> parameter affects only the UPDATEs. It can have the following values:
-
All – every change is represented with a single row
-
All update old – UPDATEs are represented by 2 rows showing before and update values of the row
cdc.fn_cdc_get_net_changes_<capture_instance> – returns one row that represents multiple changes on a single row aggregated as a single one
cdc.fn_cdc_get_net_changes_capture_instance( from_lsn , to_lsn , '<row_filter_option>' )
The rows with a single change are represented the same way as with the cdc.fn_cdc_get_all_changes function. For example, if a column was first updated from ’1970 Napa Street’ to ‘123 Street’ and then to ‘99 Daisy Street’, the cdc.fn_cdc_get_all_changes function returns all 3 transactions
While the cdc.fn_cdc_get_net_changes function returns only one
The <row_filter_option> parameter can have the following values:
-
All – returns the LSN of the final change, the __$update_mask column is always NULL
-
All with mask – returns the LSN of the final change, the __$update_mask column shows the IDs of the modified columns
-
All with merge – returns the LSN of the final change. The __$operation value is 1 for a delete, 5 when the net operation is an insert or an update. The __$update_mask column is always NULL
Both cdc.fn_cdc_get_all_changes and cdc.fn_cdc_get_net_changes functions require two parameters – the maximal and minimal Log Sequence Number (LSN) for the queried set of records
To determine the Log Sequence Number that can be used in functions, two functions are available:
sys.fn_cdc_get_min_lsn – “Returns the start_lsn column value for the specified capture instance from the cdc.change_tables system table. This value represents the low endpoint of the validity interval for the capture instance….Requires membership in the sysadmin fixed server role or db_owner fixed database role. For all other users, requires SELECT permission on all captured columns in the source table and, if a gating role for the capture instance was defined, membership in that database role.” [2]
SELECT sys.fn_cdc_get_min_lsn('Person_Address') AS min_lsn
sys.fn_cdc_get_max_lsn – similar to the sys.fn_cdc_get_min_lsn function, returns the maximum Log Sequence Number, the high endpoint of the validity interval for all source tables. No parameters are required
SELECT sys.fn_cdc_get_min_lsn('Person_Address') AS min_lsn SELECT sys.fn_cdc_get_min_lsn() AS min_lsn
sys.fn_cdc_get_column_ordinal – returns the ordinal of the column in a source table. These ordinals are used in change tables to reference a specific column
SELECT sys.fn_cdc_get_min_lsn( 'Person_Address', 'AddressLine1')
returns 2, as it’s the second column in the Person.Address table
Change Data Capture also provides a range of system stored procedures to configure, maintain, and manage the feature on the database and tables, and to acquire captured information [3]
Reading the records
In the following example, we inserted three rows into the Person.Address table, update one, and deleted one
INSERT INTO [Person].[Address] ([AddressID], [AddressLine1], [AddressLine2], [City], [StateProvinceID]) VALUES (32522, N'1234 Rodeo Drive', NULL, N'New York', 79) INSERT INTO [Person].[Address] ([AddressID], [AddressLine1], [AddressLine2], [City], [StateProvinceID]) VALUES (32523, N'2345 Red Hills Way', NULL, N'Bellevue', 79) INSERT INTO [Person].[Address] ([AddressID], [AddressLine1], [AddressLine2], [City], [StateProvinceID]) VALUES (32524, N'3456 Big City Street', NULL, N'Edmonds', 79) UPDATE [Person].[Address] SET [AddressLine1] = N'5415 La Valetta Blv.' , [City] = N'Seattle' WHERE [AddressID] = 16 DELETE FROM [Person].[Address] WHERE [AddressID] = 32524
To read the change tables, MSDN doesn’t recommend direct querying of the table, but using the system functions instead
To read all captured information for the Person.Address table, execute:
DECLARE @from_lsn binary (10), @to_lsn binary (10) SET @from_lsn = sys.fn_cdc_get_min_lsn('Person_Address') SET @to_lsn = sys.fn_cdc_get_max_lsn() SELECT * FROM cdc.fn_cdc_get_all_changes_Person_Address(@from_lsn, @to_lsn, 'all') ORDER BY __$seqval
- the first three rows with __$operation = 2 show the inserted rows
- the fourth row with __$operation = 3 is the row that was updated before the update
- the fifth row with __$operation= 4 is the updated row after the update
- the last row with __$operation = 1 is the deleted row
Here are the same operations tracked by SQL Server Change Tracking
As shown, Change Data Capture shows the exact values of all tracked columns in the modified rows, even if the column itself was not updated
Following code is an example of how to use the fn_cdc_get_column_ordinal and __$update_mask to check whether a column has been changed or not
DECLARE @from_lsn binary (10) ,@to_lsn binary (10) DECLARE @AddressIDPosition INT DECLARE @AddressLine1Position INT DECLARE @AddressLine2Position INT DECLARE @CityPosition INT DECLARE @StProvIDPos INT DECLARE @PostalCode INT SET @from_lsn = sys.fn_cdc_get_min_lsn('Person_Address') SET @to_lsn = sys.fn_cdc_get_max_lsn() SET @AddressIDPosition = sys.fn_cdc_get_column_ordinal('Person_Address', 'AddressID') SET @AddressLine1Position = sys.fn_cdc_get_column_ordinal('Person_Address', 'AddressLine1') SET @AddressLine2Position = sys.fn_cdc_get_column_ordinal('Person_Address', 'AddressLine2') SET @CityPosition = sys.fn_cdc_get_column_ordinal('Person_Address', 'City') SET @StProvIDPos = sys.fn_cdc_get_column_ordinal('Person_Address', 'StateProvinceID') SET @PostalCode = sys.fn_cdc_get_column_ordinal('Person_Address', 'PostalCode') SELECT fn_cdc_get_all_changes_Person_Address.__$operation ,fn_cdc_get_all_changes_Person_Address.__$update_mask ,sys.fn_cdc_is_bit_set(@AddressIDPosition, __$update_mask) as 'UpdatedAddressID' ,sys.fn_cdc_is_bit_set(@AddressLine1Position, __$update_mask) as 'UpdatedLine1' ,sys.fn_cdc_is_bit_set(@AddressLine2Position, __$update_mask) as 'UpdatedLine2' ,sys.fn_cdc_is_bit_set(@CityPosition fn_cdc_get_column_ordinal) as 'UpdatedCity' ,sys.fn_cdc_is_bit_set(@StProvIDPos, __$update_mask) as 'UpdatedState' ,sys.fn_cdc_is_bit_set(@PostalCode, __$update_mask) as 'Updated Postal' FROM cdc.fn_cdc_get_all_changes_Person_Address(@from_lsn, @to_lsn, 'all') ORDER BY __$seqval
The result set shows 1 if the Updated_<column_name> was modified, 0 otherwise
While Change Tracking shows only what was changed and whether the change was an insert, update, or delete, Change Data Capture shows the values inserted, deleted or updated for the modified rows. For updates, it shows both old and new values of the updated row
The feature doesn’t track the user who made the change. To do that, you have to create a new field where the user’s details are stored and updated after each change. The same goes for the time of the change and the machine used to make the change. The execution of the SELECT statements and object access are not tracked
As change capturing is an asynchronous process. First a change is committed to a source table and the change is added to the change table afterwards. There is a delay between these two actions. The captured info must be obtained using functions. When a table schema changes, the changes to the affected columns will be ignored unless a new change table is associated with the source
The data captured in change tables can grow uncontrollably, if you stop the job that purges the data, or modify it so it doesn’t run often enough. The feature is supported only in Enterprise and Developer editions
If you need detailed information about the data changes on your tables, Change Data Capture is a better solution than Change Tracking, as it provides the values that were inserted, deleted or updated. If the feature doesn’t track all events you would like and provide all the information you are looking for, there are other auditing solutions. In the next article in the series, we’ll analyze the SQL Server Auditing feature vs ApexSQL Audit
All articles in this series:
- What is SQL Server Change Tracking and how to set it up?
- How to read SQL Server Change Tracking results
- SQL Server Change Data Capture (CDC) – Introduction
- How to enable and use SQL Server Change Data Capture
- How to analyze and read Change Data Capture (CDC) records
- SQL Server Audit feature – Introduction
- How to set up and use SQL Server Audit
- How to analyze and read SQL Server Audit information
- Auditing SQL Server data changes – the centralized solution
Downloads
Please download the script(s) associated with this article on our GitHub repository.
Please contact us for any problems or questions with the scripts.
References:
[1] MSDN – Change Data Capture Functions
[2] MSDN – sys.fn_cdc_get_min_lsn (Transact-SQL)
[3] MSDN – Change Data Capture Stored Procedures
Useful resources:
MSDN – Work with Change Data (SQL Server)
MSDN – Change Data Capture Tables (Transact-SQL)
MSDN – cdc.<capture_instance>_CT
November 12, 2013