Simplify

Keeping things simple

Posts Tagged ‘SQL

SQL 2008 Change Data Capture – The basics

with 4 comments

Continuing on the previous article on SQL 2008 Data Audit – so wittingly called The beginning now we explore the basics of Change Data Capture in SQL server 2008. The first thing we need to do is to enable change data tracking for our database. Unfortunately there is no UI to do this so you have to do it with T-SQL command.

Enabling on database level

USE Test;
GO;
EXECUTE sys.sp_cdc_enable_db;
GO

This line of code will enable your database for change data capture. You enable and disable change data capture with two system stored procedures sys.sp_cdc_enable_db and sys.sp_cdc_disable_db. The changes made to your database you can check by running the next command.

SELECT name, is_cdc_enabled
FROM sys.databases

database

You can see in the picture above that the Test database has been enabled for change data capture. Right now you database has not been changed by SQL server the stored procedure has just updated the database property is_cdc_enabled in the sys.databases to true.

Enabling on table level

USE Test;
GO;
EXECUTE sys.sp_cdc_enable_table
    @source_schema = N'dbo'
    , @source_name = N'Test'
    , @role_name = NULL
GO

This line of code will enable your table for change data capture which you can check by running the next command. You enable and disable change data capture for tables with two system stored procedures sys.sp_cdc_enable_table and sys.sp_cdc_disable_table. When you enable the first table in the database for change data capture SQL server will create several tables, functions and jobs for your database. The changes made to your database should be as so.

SELECT name, is_tracked_by_cdc
FROM sys.tables
database
jobs
tables functions

Here highlighted in red is the table that will actually capture your data, other tables in this picture are supporting tables for data capture and are created only once per database when you enable your first table for change data capture. Check more in detail about tables, stored procedures and functions.

Change data capture table

The data stored in data capture tables consists of some change data capture columns and your table columns.

cdc_table_empty

The columns displayed here are from the change data capture table for our dbo.Test table. We have columns __$start_lsn and __$end_lsn which mark the first log sequence number and the last log sequence number under which the changes were recorded in the transaction log, then there are __$seqval which is the order of the row change in a transaction, the __$operation signals the type of the operation which was executed and last is the __$update_mask is a bit mask showing which columns have been updated in the row change. These columns appear in any change data capture, after that come your table columns with their respective data type. So let’s make a change to our table and see what happens.

Recording change data capture

INSERT INTO dbo.Test
(Col1, Col2)
VALUES(500, 600) 

INSERT INTO dbo.Test
(Col1, Col2)
VALUES(501, 601)

So let’s what has been recorded in our change data capture table cdc.dbo_Test_CT

SELECT *
FROM cdc.dbo_Test_CT

changes

Now let’s perform a few more operations and see what we get.

UPDATE dbo.Test
SET Col1 = 503
WHERE ID = 1 

UPDATE dbo.Test
SET Col1 = 504
WHERE ID = 2 

DELETE dbo.Test
WHERE ID = 1

changes2

Not what you have expected we have executed in total 5 DML statements but the change data capture table contains 7 operations, how come? Let’s make an another query which will make things clearer.

Reading change capture data

SELECT
    CT.__$start_lsn,
    CT.__$operation,
    CASE CT.__$operation
        WHEN 1 THEN 'Delete'
        WHEN 2 THEN 'Insert'
        WHEN 3 THEN 'Update - Pre'
        WHEN 4 THEN 'Update - Post'
    END AS Operation,
    CT.ID,
    CT.Col1,
    CT.Col2,
    LSN.tran_begin_time,
    LSN.tran_end_time,
    LSN.tran_id
FROM
    cdc.dbo_Test_CT AS CT INNER JOIN
    cdc.lsn_time_mapping AS LSN ON CT.__$start_lsn = LSN.start_lsn

changes_transaction

What we have done here is joined the change data capture table for our Test table with the cdc.lsn_time_mapping table which stores information about the log sequence number and time values and allowing us to tie time with the specific log sequence numbers. One more important thing we have here is the tran_id field which we will use in the next article. Now we can se why we have 7 records instead of 5 records which we expected to have. Change data capture when capturing data from the UPDATE statement will create two records one with __$operation value of 3 and one with value of 4. The record with value 3 represents the data that was in the row which was there before the update has been committed and the record with the value 4 represents the data after commit.

One more thing you can see here is the times when the transactions where started and completed. This is very useful when trying to get changes in a specific time frame which can be done by using change data capture functions provided by SQL server.

Reading interval change capture data

The first thing we want to do is to find out which log sequence numbers are related with our time frame we want to look at.

SELECT
    sys.fn_cdc_map_time_to_lsn ('smallest greater than or equal', '2009-06-26 15:37:46.577')
    ,sys.fn_cdc_map_time_to_lsn ('largest less than or equal', GETDATE())

Here we use the sys.fn_cdc_map_time_to_lsn function to retrieve the log sequence numbers for the start time of our interval (smallest where time is grater than or equal to ‘2009-06-26 15:37:46.577’) and the end log sequence number of our interval (largest where time is less than or equal to GETDATE()). Next up we have to use those log sequence numbers to get something useful from the table we can do it by using an another function cdc.fn_cdc_get_all_changes and cdc.fn_cdc_get_net_changes, the difference between the two will be apparent immediately.

SELECT *
FROM cdc.fn_cdc_get_all_changes_dbo_Test (
    0x00000066000000A00004
    , 0x0000006A000001F70001
    , 'all') 

SELECT *
FROM cdc.fn_cdc_get_net_changes_dbo_Test (
    0x00000066000000A00004
    , 0x0000006A000001F70001
    , 'all')

This is the result:

changes_all

changes_net

The cdc.fn_cdc_get_all_changes function returned all the changes that happened in our chosen interval giving us 4 total recorded changes. The second query using cdc.fn_cdc_get_net_changes returned only 2 records in the same interval why so? Well simple all_changes function returns as it names says all changes that fall between our lsn boundaries, but the net_changes returns only the last changes that happened in the interval, but what does that mean? As you can see we have only one record by each row and that is this the last operation executed on that row in the given interval.

One more thing to notice is that when executing this functions records with __$operation = 3 will not appear in the result set although it does fall into our selected interval. Since this functions will only return changes the __$operation = 3 (which is the delete operation of the update columns) will not appear.

Reading changed columns

Useful thing to now is which column has changed in an operation. The thing to use here is an another scalar valued function provided by change data capture sys.fn_cdc_has_column_changed, this function will return 1 or 0 depending if the column has been changed.

SELECT
    *
    , sys.fn_cdc_has_column_changed ('dbo_Test', 'ID', __$update_mask) AS is_id_changed
    , sys.fn_cdc_has_column_changed ('dbo_Test', 'Col1', __$update_mask) AS is_col1_changed
    , sys.fn_cdc_has_column_changed ('dbo_Test', 'Col2', __$update_mask) AS id_col2_changed
FROM
    cdc.fn_cdc_get_all_changes_dbo_Test (
        0x00000066000000A00004
        , 0x0000006A000001F70001
        , 'all')

changes_columns

This way we can see which columns have been changed in each operation and perform actions required according to our business rules. Thing to note here is that when you perform the insert and delete operation (__$operation = 2 || 1) the changed column signaled is all columns because you have either created values for all columns or deleted values for all columns. When looking at the changes for update (__$operation = 4) then you see only the actual updated column signaled.

This concludes this session, till next time….

Written by Luka Ferlež

June 24, 2009 at 16:00

Posted in SQL

Tagged with , ,

SQL 2008 Change Data Capture – The beginning

with 2 comments

Data auditing is a requirement of most enterprise systems and was a pain to implement with SQL server up to now.When data audit was required in pre SQL server 2008 databases you had to relay on solutions like triggers or some custom change tracking on the application layer. With the dawn of SQL server edition 2008 Microsoft has provided with change data capture mechanism which is an integral part of the SQL server 2008 data engine. The change data capture is a powerful mechanism for gathering changes made to data on the SQL server platform and provides it’ functionality with little overhead.

How this works

Well in short simple. Implementation of change data capture is based on an already existing element of the data engine the transaction log and does not require you to alter your db, schema or tables. When you enabled change data capture on a db SQL server will create a new schema called “cdc” with several tables, stored procedures and functions. With the changes made to your db the enabling process also creates two jobs in you SQL agent for your db one for capture and one for cleanup. These two jobs are the base for the asynchronous performance for the change data capture. The capture job is executed by default every 5 seconds, it reads the transaction log for changes from the last job execution and stores them in the change data capture tables. The cleanup job works by default once a day and cleans the stored data in the data capture tables, meaning that you have to gather that data and transfer it somewhere for permanent storage before the cleanup job runs.

Performance

Microsoft published on MSDN a SQL Server Best Practices Article on “Tuning the Performance of Change Data Capture in SQL Server 2008” going in length on why, how and where to tune the performance of change data capture. The general conclusion of this article and many articles on the web is that the Change Data Capture has very little overhead and can be used comfortably on OLTP database with no fear of grinding your database server to a stand still. You will see a slight decrease in performance through the increased disk writes from the change data capture job but the decrease in performance if the capture job is properly configured should not be more than 5% by my test. You will see an increase in the disk queue length for the log file but only a slight for the data disk.

Problems

There are of course problems with the SQL change data tracking. The three main problems that I have faced in working with change data tracking are storing contextual information, gathering change data in one place, schema altering.

Contextual information

Storing contextual information about the change is a pain with change data capture. In most cases when performing audit you want to store some contextual data about the current operation which is changing the data like the userid, data, reason of change (form, command, etc.) …. Since the change data capture relies on an asynchronous read of the transaction log, in the time of the read SQL server does not have any information about the context under which the change has occurred and therefore that data can not be gathered.

Gathering change data in one place

The change data capture as is designed and implemented (as I can figure it) is intended to capture data for temporary storage before you transfer it to data warehouse for permanent storage. The data warehouse is intended to have the same schema (no primary or foreign keys) like the OLTP database and you simply copy the rows in the data warehouse. There is no built in way as it was not intended for such use to gather the data all in one place for review.

Schema altering

This brings us to the last problem of altering the schema. If you alter the schema of your OLTP db, you have to change the schema for your data warehouse db. That is no problem when adding columns but what when you delete the columns, and how to then display the data for the user? There is one more thing I think is just pure laziness that Microsoft did not implement. The problem with schema changing is that when you change the schema the change data capture continues to track the same columns (as well as there data type) as before with no input to you that you should do something to start capturing the new columns. It would suffice that when changing the table under change data capture you would get an error that you can not do that until you drop the change data capture for that table, but it seams that was to much for Microsoft.

There are workaround for all of this problems and I will show them in the next article on change data tracking.

Conclusion

SQL server 2008 change data tracking is a well welcome addition to SQL server feature list which helps us to create audits for your databases easily with little overhead. With some missing features which have simple workarounds this feature can be used very effectively for auditing your database.

Check back for more articles on change data tracking with the details of implementation and workarounds need to make the whole thing functional.

Written by Luka Ferlež

June 20, 2009 at 18:34

Putting C# into SQL

leave a comment »

In my latest project we have come across a need to get results from a web service inside SQL Server. First basic reason why we wanted to have such an integration of web services is so that we could use them transparently in Microsoft Reporting Services. That of course is not the only reason we also wanted to use the same results as an input to some of our stored procedures and to do that we need to be able to get the results from the web service through a stored procedure or a table valued function.

The process it self is quite simple if you know where to begin. The beginning is in Sql servers implementation of CLR, with the use of CLR you can easily show those DBA’s how real programing is done :). Unfortunately for the DBA’s this is the only way this can be done so they need to trust us “developers” :). Seriously this is not such a problem to implement.

Pop open up Visual Studio and create a new “Sql Server Project” in the language of your preference (VB or C#) it’s all the same, but I will use C#. To start us off we need to configure the project first, so open “Project Properties” go to the “Build” section and find the option “Generate serialization assembly” set the option from auto to on. You have to do this because Sql server clr does not allow dynamic xml serialization, I guess it a security thing. Now when you build your project you will end up with two dlls one for you code “project.dll” and the other one for the serialized xml “project.serialization.dll”.

Now add your web service to the project just like in any other project trough the “Add web reference” menu, after you have done sou let’s add a new item to the project, add a stored procedure item to the project. Here is some sample code for the stored procedure.

[Microsoft.SqlServer.Server.SqlProcedure]
public static void WebServiceTest()
{
	// Initialize web service, adjust it for your service
	using (WebService ws = new WebService())
	{
		// Create a template record for the result set
		SqlDataRecord record = new SqlDataRecord(
			// The strings are the names of the columns of the result set
			// first field of type int
			metaData[0] = new SqlMetaData("field1", SqlDbType.Int),
			// second field of type varchar, equivalent of varchar(max)
			metaData[2] = new SqlMetaData("field2", SqlDbType.VarChar, -1),
			// third field of type decimal equivalent of decimal(12,4)
			metaData[4] = new SqlMetaData("field3", SqlDbType.Decimal, 12, 4));

		// Mark the begining of the result-set.
		SqlContext.Pipe.SendResultsStart(record);

		// Send rows back to the client.
		// Get the results from the web service, query a method on your web service
		foreach (Result item in ws.GetResult())
		{
			// set the value for field1
			record.SetInt32("field1", item.Value1);
			// set the value for field2
			record.SetString("field2", item.Value2);
			// set the value for field3
			record.SetDecimal("field3", item.Value3);

			// Send the row back to the client.
			SqlContext.Pipe.SendResultsRow(record);
		}

		// Mark the end of the result-set.
		SqlContext.Pipe.SendResultsEnd();
	}
}

Let’s look at this code. On line 4 is the initialization of the web service you will have to adjust it to your web service. Then you have to initialize a template record for your result set as is done on line 8 trough line 15. We initialize there columns of your result set column one is named “field1” and has a type of int, column two is named “field2” and has a type of varchar with length -1 or varchar(max), column three is named “field3” and has a type of decimal with length 12 and precision 4 or decimal(12,4). On line 18 we start the return of our result set and tell sql that we will be returning items based on record template we have just created. On lines 22 through 33 we iterate through our result set, set the values and use the Sql pipe to send the results. On line 36 we end the sending of results. That is it for the code part, so now we need to register the assembly in the sql server.First you have to set the database trustworthy option to on.

ALTER DATABASE DatabaseName SET TRUSTWORTHY ON

If we do not set this option to on we can’t register our assemblies as UNSAFE, and if we cant do that Sql clr wont allow us to connect to the web service. I think that you could use the EXTERNAL_ACCESS option but I have not tried that.

CREATE ASSEMBLY [AssemblyName]
FROM'C:\AssemblyPath\AssemblyName.dll'
WITH PERMISSION_SET = UNSAFE
GO
CREATE ASSEMBLY [AssembyName]
FROM'C:\AssemblyPath\AssemblyName.Serialization.dll'
WITH PERMISSION_SET = UNSAFE
GO
CREATE PROCEDURE [SchemaName].[WebServiceTest]
AS
EXTERNAL NAME [AssemblyName].Namespace.WebServiceTest

That is it now you can test out your new sql clr procedure.

Written by Luka Ferlež

March 22, 2008 at 23:39

Posted in C#, Programing, SQL

Tagged with , ,