SQL 2008 Change Data Capture – The basics
Continuing on the previous article on SQL 2008 Data Audit – so wittingly called The beginning now we explore the basics of Change Data Capture in SQL server 2008. The first thing we need to do is to enable change data tracking for our database. Unfortunately there is no UI to do this so you have to do it with T-SQL command.
Enabling on database level
USE Test; GO; EXECUTE sys.sp_cdc_enable_db; GO
This line of code will enable your database for change data capture. You enable and disable change data capture with two system stored procedures sys.sp_cdc_enable_db and sys.sp_cdc_disable_db. The changes made to your database you can check by running the next command.
SELECT name, is_cdc_enabled FROM sys.databases
You can see in the picture above that the Test database has been enabled for change data capture. Right now you database has not been changed by SQL server the stored procedure has just updated the database property is_cdc_enabled in the sys.databases to true.
Enabling on table level
USE Test; GO; EXECUTE sys.sp_cdc_enable_table @source_schema = N'dbo' , @source_name = N'Test' , @role_name = NULL GO
This line of code will enable your table for change data capture which you can check by running the next command. You enable and disable change data capture for tables with two system stored procedures sys.sp_cdc_enable_table and sys.sp_cdc_disable_table. When you enable the first table in the database for change data capture SQL server will create several tables, functions and jobs for your database. The changes made to your database should be as so.
SELECT name, is_tracked_by_cdc FROM sys.tables
Here highlighted in red is the table that will actually capture your data, other tables in this picture are supporting tables for data capture and are created only once per database when you enable your first table for change data capture. Check more in detail about tables, stored procedures and functions.
Change data capture table
The data stored in data capture tables consists of some change data capture columns and your table columns.
The columns displayed here are from the change data capture table for our dbo.Test table. We have columns __$start_lsn and __$end_lsn which mark the first log sequence number and the last log sequence number under which the changes were recorded in the transaction log, then there are __$seqval which is the order of the row change in a transaction, the __$operation signals the type of the operation which was executed and last is the __$update_mask is a bit mask showing which columns have been updated in the row change. These columns appear in any change data capture, after that come your table columns with their respective data type. So let’s make a change to our table and see what happens.
Recording change data capture
INSERT INTO dbo.Test (Col1, Col2) VALUES(500, 600) INSERT INTO dbo.Test (Col1, Col2) VALUES(501, 601)
So let’s what has been recorded in our change data capture table cdc.dbo_Test_CT
SELECT * FROM cdc.dbo_Test_CT
Now let’s perform a few more operations and see what we get.
UPDATE dbo.Test SET Col1 = 503 WHERE ID = 1 UPDATE dbo.Test SET Col1 = 504 WHERE ID = 2 DELETE dbo.Test WHERE ID = 1
Not what you have expected we have executed in total 5 DML statements but the change data capture table contains 7 operations, how come? Let’s make an another query which will make things clearer.
Reading change capture data
SELECT CT.__$start_lsn, CT.__$operation, CASE CT.__$operation WHEN 1 THEN 'Delete' WHEN 2 THEN 'Insert' WHEN 3 THEN 'Update - Pre' WHEN 4 THEN 'Update - Post' END AS Operation, CT.ID, CT.Col1, CT.Col2, LSN.tran_begin_time, LSN.tran_end_time, LSN.tran_id FROM cdc.dbo_Test_CT AS CT INNER JOIN cdc.lsn_time_mapping AS LSN ON CT.__$start_lsn = LSN.start_lsn
What we have done here is joined the change data capture table for our Test table with the cdc.lsn_time_mapping table which stores information about the log sequence number and time values and allowing us to tie time with the specific log sequence numbers. One more important thing we have here is the tran_id field which we will use in the next article. Now we can se why we have 7 records instead of 5 records which we expected to have. Change data capture when capturing data from the UPDATE statement will create two records one with __$operation value of 3 and one with value of 4. The record with value 3 represents the data that was in the row which was there before the update has been committed and the record with the value 4 represents the data after commit.
One more thing you can see here is the times when the transactions where started and completed. This is very useful when trying to get changes in a specific time frame which can be done by using change data capture functions provided by SQL server.
Reading interval change capture data
The first thing we want to do is to find out which log sequence numbers are related with our time frame we want to look at.
SELECT sys.fn_cdc_map_time_to_lsn ('smallest greater than or equal', '2009-06-26 15:37:46.577') ,sys.fn_cdc_map_time_to_lsn ('largest less than or equal', GETDATE())
Here we use the sys.fn_cdc_map_time_to_lsn function to retrieve the log sequence numbers for the start time of our interval (smallest where time is grater than or equal to ‘2009-06-26 15:37:46.577’) and the end log sequence number of our interval (largest where time is less than or equal to GETDATE()). Next up we have to use those log sequence numbers to get something useful from the table we can do it by using an another function cdc.fn_cdc_get_all_changes and cdc.fn_cdc_get_net_changes, the difference between the two will be apparent immediately.
SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_Test ( 0x00000066000000A00004 , 0x0000006A000001F70001 , 'all') SELECT * FROM cdc.fn_cdc_get_net_changes_dbo_Test ( 0x00000066000000A00004 , 0x0000006A000001F70001 , 'all')
This is the result:
The cdc.fn_cdc_get_all_changes function returned all the changes that happened in our chosen interval giving us 4 total recorded changes. The second query using cdc.fn_cdc_get_net_changes returned only 2 records in the same interval why so? Well simple all_changes function returns as it names says all changes that fall between our lsn boundaries, but the net_changes returns only the last changes that happened in the interval, but what does that mean? As you can see we have only one record by each row and that is this the last operation executed on that row in the given interval.
One more thing to notice is that when executing this functions records with __$operation = 3 will not appear in the result set although it does fall into our selected interval. Since this functions will only return changes the __$operation = 3 (which is the delete operation of the update columns) will not appear.
Reading changed columns
Useful thing to now is which column has changed in an operation. The thing to use here is an another scalar valued function provided by change data capture sys.fn_cdc_has_column_changed, this function will return 1 or 0 depending if the column has been changed.
SELECT * , sys.fn_cdc_has_column_changed ('dbo_Test', 'ID', __$update_mask) AS is_id_changed , sys.fn_cdc_has_column_changed ('dbo_Test', 'Col1', __$update_mask) AS is_col1_changed , sys.fn_cdc_has_column_changed ('dbo_Test', 'Col2', __$update_mask) AS id_col2_changed FROM cdc.fn_cdc_get_all_changes_dbo_Test ( 0x00000066000000A00004 , 0x0000006A000001F70001 , 'all')
This way we can see which columns have been changed in each operation and perform actions required according to our business rules. Thing to note here is that when you perform the insert and delete operation (__$operation = 2 || 1) the changed column signaled is all columns because you have either created values for all columns or deleted values for all columns. When looking at the changes for update (__$operation = 4) then you see only the actual updated column signaled.
This concludes this session, till next time….