Posts Tagged ‘SQL server 2008’
Continuing on the previous article on SQL 2008 Data Audit – so wittingly called The beginning now we explore the basics of Change Data Capture in SQL server 2008. The first thing we need to do is to enable change data tracking for our database. Unfortunately there is no UI to do this so you have to do it with T-SQL command.
Enabling on database level
USE Test; GO; EXECUTE sys.sp_cdc_enable_db; GO
This line of code will enable your database for change data capture. You enable and disable change data capture with two system stored procedures sys.sp_cdc_enable_db and sys.sp_cdc_disable_db. The changes made to your database you can check by running the next command.
SELECT name, is_cdc_enabled FROM sys.databases
You can see in the picture above that the Test database has been enabled for change data capture. Right now you database has not been changed by SQL server the stored procedure has just updated the database property is_cdc_enabled in the sys.databases to true.
Enabling on table level
USE Test; GO; EXECUTE sys.sp_cdc_enable_table @source_schema = N'dbo' , @source_name = N'Test' , @role_name = NULL GO
This line of code will enable your table for change data capture which you can check by running the next command. You enable and disable change data capture for tables with two system stored procedures sys.sp_cdc_enable_table and sys.sp_cdc_disable_table. When you enable the first table in the database for change data capture SQL server will create several tables, functions and jobs for your database. The changes made to your database should be as so.
SELECT name, is_tracked_by_cdc FROM sys.tables
Here highlighted in red is the table that will actually capture your data, other tables in this picture are supporting tables for data capture and are created only once per database when you enable your first table for change data capture. Check more in detail about tables, stored procedures and functions.
Change data capture table
The data stored in data capture tables consists of some change data capture columns and your table columns.
The columns displayed here are from the change data capture table for our dbo.Test table. We have columns __$start_lsn and __$end_lsn which mark the first log sequence number and the last log sequence number under which the changes were recorded in the transaction log, then there are __$seqval which is the order of the row change in a transaction, the __$operation signals the type of the operation which was executed and last is the __$update_mask is a bit mask showing which columns have been updated in the row change. These columns appear in any change data capture, after that come your table columns with their respective data type. So let’s make a change to our table and see what happens.
Recording change data capture
INSERT INTO dbo.Test (Col1, Col2) VALUES(500, 600) INSERT INTO dbo.Test (Col1, Col2) VALUES(501, 601)
So let’s what has been recorded in our change data capture table cdc.dbo_Test_CT
SELECT * FROM cdc.dbo_Test_CT
Now let’s perform a few more operations and see what we get.
UPDATE dbo.Test SET Col1 = 503 WHERE ID = 1 UPDATE dbo.Test SET Col1 = 504 WHERE ID = 2 DELETE dbo.Test WHERE ID = 1
Not what you have expected we have executed in total 5 DML statements but the change data capture table contains 7 operations, how come? Let’s make an another query which will make things clearer.
Reading change capture data
SELECT CT.__$start_lsn, CT.__$operation, CASE CT.__$operation WHEN 1 THEN 'Delete' WHEN 2 THEN 'Insert' WHEN 3 THEN 'Update - Pre' WHEN 4 THEN 'Update - Post' END AS Operation, CT.ID, CT.Col1, CT.Col2, LSN.tran_begin_time, LSN.tran_end_time, LSN.tran_id FROM cdc.dbo_Test_CT AS CT INNER JOIN cdc.lsn_time_mapping AS LSN ON CT.__$start_lsn = LSN.start_lsn
What we have done here is joined the change data capture table for our Test table with the cdc.lsn_time_mapping table which stores information about the log sequence number and time values and allowing us to tie time with the specific log sequence numbers. One more important thing we have here is the tran_id field which we will use in the next article. Now we can se why we have 7 records instead of 5 records which we expected to have. Change data capture when capturing data from the UPDATE statement will create two records one with __$operation value of 3 and one with value of 4. The record with value 3 represents the data that was in the row which was there before the update has been committed and the record with the value 4 represents the data after commit.
One more thing you can see here is the times when the transactions where started and completed. This is very useful when trying to get changes in a specific time frame which can be done by using change data capture functions provided by SQL server.
Reading interval change capture data
The first thing we want to do is to find out which log sequence numbers are related with our time frame we want to look at.
SELECT sys.fn_cdc_map_time_to_lsn ('smallest greater than or equal', '2009-06-26 15:37:46.577') ,sys.fn_cdc_map_time_to_lsn ('largest less than or equal', GETDATE())
Here we use the sys.fn_cdc_map_time_to_lsn function to retrieve the log sequence numbers for the start time of our interval (smallest where time is grater than or equal to ‘2009-06-26 15:37:46.577’) and the end log sequence number of our interval (largest where time is less than or equal to GETDATE()). Next up we have to use those log sequence numbers to get something useful from the table we can do it by using an another function cdc.fn_cdc_get_all_changes and cdc.fn_cdc_get_net_changes, the difference between the two will be apparent immediately.
SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_Test ( 0x00000066000000A00004 , 0x0000006A000001F70001 , 'all') SELECT * FROM cdc.fn_cdc_get_net_changes_dbo_Test ( 0x00000066000000A00004 , 0x0000006A000001F70001 , 'all')
This is the result:
The cdc.fn_cdc_get_all_changes function returned all the changes that happened in our chosen interval giving us 4 total recorded changes. The second query using cdc.fn_cdc_get_net_changes returned only 2 records in the same interval why so? Well simple all_changes function returns as it names says all changes that fall between our lsn boundaries, but the net_changes returns only the last changes that happened in the interval, but what does that mean? As you can see we have only one record by each row and that is this the last operation executed on that row in the given interval.
One more thing to notice is that when executing this functions records with __$operation = 3 will not appear in the result set although it does fall into our selected interval. Since this functions will only return changes the __$operation = 3 (which is the delete operation of the update columns) will not appear.
Reading changed columns
Useful thing to now is which column has changed in an operation. The thing to use here is an another scalar valued function provided by change data capture sys.fn_cdc_has_column_changed, this function will return 1 or 0 depending if the column has been changed.
SELECT * , sys.fn_cdc_has_column_changed ('dbo_Test', 'ID', __$update_mask) AS is_id_changed , sys.fn_cdc_has_column_changed ('dbo_Test', 'Col1', __$update_mask) AS is_col1_changed , sys.fn_cdc_has_column_changed ('dbo_Test', 'Col2', __$update_mask) AS id_col2_changed FROM cdc.fn_cdc_get_all_changes_dbo_Test ( 0x00000066000000A00004 , 0x0000006A000001F70001 , 'all')
This way we can see which columns have been changed in each operation and perform actions required according to our business rules. Thing to note here is that when you perform the insert and delete operation (__$operation = 2 || 1) the changed column signaled is all columns because you have either created values for all columns or deleted values for all columns. When looking at the changes for update (__$operation = 4) then you see only the actual updated column signaled.
This concludes this session, till next time….
Data auditing is a requirement of most enterprise systems and was a pain to implement with SQL server up to now.When data audit was required in pre SQL server 2008 databases you had to relay on solutions like triggers or some custom change tracking on the application layer. With the dawn of SQL server edition 2008 Microsoft has provided with change data capture mechanism which is an integral part of the SQL server 2008 data engine. The change data capture is a powerful mechanism for gathering changes made to data on the SQL server platform and provides it’ functionality with little overhead.
How this works
Well in short simple. Implementation of change data capture is based on an already existing element of the data engine the transaction log and does not require you to alter your db, schema or tables. When you enabled change data capture on a db SQL server will create a new schema called “cdc” with several tables, stored procedures and functions. With the changes made to your db the enabling process also creates two jobs in you SQL agent for your db one for capture and one for cleanup. These two jobs are the base for the asynchronous performance for the change data capture. The capture job is executed by default every 5 seconds, it reads the transaction log for changes from the last job execution and stores them in the change data capture tables. The cleanup job works by default once a day and cleans the stored data in the data capture tables, meaning that you have to gather that data and transfer it somewhere for permanent storage before the cleanup job runs.
Microsoft published on MSDN a SQL Server Best Practices Article on “Tuning the Performance of Change Data Capture in SQL Server 2008” going in length on why, how and where to tune the performance of change data capture. The general conclusion of this article and many articles on the web is that the Change Data Capture has very little overhead and can be used comfortably on OLTP database with no fear of grinding your database server to a stand still. You will see a slight decrease in performance through the increased disk writes from the change data capture job but the decrease in performance if the capture job is properly configured should not be more than 5% by my test. You will see an increase in the disk queue length for the log file but only a slight for the data disk.
There are of course problems with the SQL change data tracking. The three main problems that I have faced in working with change data tracking are storing contextual information, gathering change data in one place, schema altering.
Storing contextual information about the change is a pain with change data capture. In most cases when performing audit you want to store some contextual data about the current operation which is changing the data like the userid, data, reason of change (form, command, etc.) …. Since the change data capture relies on an asynchronous read of the transaction log, in the time of the read SQL server does not have any information about the context under which the change has occurred and therefore that data can not be gathered.
Gathering change data in one place
The change data capture as is designed and implemented (as I can figure it) is intended to capture data for temporary storage before you transfer it to data warehouse for permanent storage. The data warehouse is intended to have the same schema (no primary or foreign keys) like the OLTP database and you simply copy the rows in the data warehouse. There is no built in way as it was not intended for such use to gather the data all in one place for review.
This brings us to the last problem of altering the schema. If you alter the schema of your OLTP db, you have to change the schema for your data warehouse db. That is no problem when adding columns but what when you delete the columns, and how to then display the data for the user? There is one more thing I think is just pure laziness that Microsoft did not implement. The problem with schema changing is that when you change the schema the change data capture continues to track the same columns (as well as there data type) as before with no input to you that you should do something to start capturing the new columns. It would suffice that when changing the table under change data capture you would get an error that you can not do that until you drop the change data capture for that table, but it seams that was to much for Microsoft.
There are workaround for all of this problems and I will show them in the next article on change data tracking.
SQL server 2008 change data tracking is a well welcome addition to SQL server feature list which helps us to create audits for your databases easily with little overhead. With some missing features which have simple workarounds this feature can be used very effectively for auditing your database.
Check back for more articles on change data tracking with the details of implementation and workarounds need to make the whole thing functional.