SQL 2008 Change Data Capture – The beginning
Data auditing is a requirement of most enterprise systems and was a pain to implement with SQL server up to now.When data audit was required in pre SQL server 2008 databases you had to relay on solutions like triggers or some custom change tracking on the application layer. With the dawn of SQL server edition 2008 Microsoft has provided with change data capture mechanism which is an integral part of the SQL server 2008 data engine. The change data capture is a powerful mechanism for gathering changes made to data on the SQL server platform and provides it’ functionality with little overhead.
How this works
Well in short simple. Implementation of change data capture is based on an already existing element of the data engine the transaction log and does not require you to alter your db, schema or tables. When you enabled change data capture on a db SQL server will create a new schema called “cdc” with several tables, stored procedures and functions. With the changes made to your db the enabling process also creates two jobs in you SQL agent for your db one for capture and one for cleanup. These two jobs are the base for the asynchronous performance for the change data capture. The capture job is executed by default every 5 seconds, it reads the transaction log for changes from the last job execution and stores them in the change data capture tables. The cleanup job works by default once a day and cleans the stored data in the data capture tables, meaning that you have to gather that data and transfer it somewhere for permanent storage before the cleanup job runs.
Microsoft published on MSDN a SQL Server Best Practices Article on “Tuning the Performance of Change Data Capture in SQL Server 2008” going in length on why, how and where to tune the performance of change data capture. The general conclusion of this article and many articles on the web is that the Change Data Capture has very little overhead and can be used comfortably on OLTP database with no fear of grinding your database server to a stand still. You will see a slight decrease in performance through the increased disk writes from the change data capture job but the decrease in performance if the capture job is properly configured should not be more than 5% by my test. You will see an increase in the disk queue length for the log file but only a slight for the data disk.
There are of course problems with the SQL change data tracking. The three main problems that I have faced in working with change data tracking are storing contextual information, gathering change data in one place, schema altering.
Storing contextual information about the change is a pain with change data capture. In most cases when performing audit you want to store some contextual data about the current operation which is changing the data like the userid, data, reason of change (form, command, etc.) …. Since the change data capture relies on an asynchronous read of the transaction log, in the time of the read SQL server does not have any information about the context under which the change has occurred and therefore that data can not be gathered.
Gathering change data in one place
The change data capture as is designed and implemented (as I can figure it) is intended to capture data for temporary storage before you transfer it to data warehouse for permanent storage. The data warehouse is intended to have the same schema (no primary or foreign keys) like the OLTP database and you simply copy the rows in the data warehouse. There is no built in way as it was not intended for such use to gather the data all in one place for review.
This brings us to the last problem of altering the schema. If you alter the schema of your OLTP db, you have to change the schema for your data warehouse db. That is no problem when adding columns but what when you delete the columns, and how to then display the data for the user? There is one more thing I think is just pure laziness that Microsoft did not implement. The problem with schema changing is that when you change the schema the change data capture continues to track the same columns (as well as there data type) as before with no input to you that you should do something to start capturing the new columns. It would suffice that when changing the table under change data capture you would get an error that you can not do that until you drop the change data capture for that table, but it seams that was to much for Microsoft.
There are workaround for all of this problems and I will show them in the next article on change data tracking.
SQL server 2008 change data tracking is a well welcome addition to SQL server feature list which helps us to create audits for your databases easily with little overhead. With some missing features which have simple workarounds this feature can be used very effectively for auditing your database.
Check back for more articles on change data tracking with the details of implementation and workarounds need to make the whole thing functional.