Simplify

Keeping things simple

SQL 2008 Change Data Capture – The beginning

with 2 comments

Data auditing is a requirement of most enterprise systems and was a pain to implement with SQL server up to now.When data audit was required in pre SQL server 2008 databases you had to relay on solutions like triggers or some custom change tracking on the application layer. With the dawn of SQL server edition 2008 Microsoft has provided with change data capture mechanism which is an integral part of the SQL server 2008 data engine. The change data capture is a powerful mechanism for gathering changes made to data on the SQL server platform and provides it’ functionality with little overhead.

How this works

Well in short simple. Implementation of change data capture is based on an already existing element of the data engine the transaction log and does not require you to alter your db, schema or tables. When you enabled change data capture on a db SQL server will create a new schema called “cdc” with several tables, stored procedures and functions. With the changes made to your db the enabling process also creates two jobs in you SQL agent for your db one for capture and one for cleanup. These two jobs are the base for the asynchronous performance for the change data capture. The capture job is executed by default every 5 seconds, it reads the transaction log for changes from the last job execution and stores them in the change data capture tables. The cleanup job works by default once a day and cleans the stored data in the data capture tables, meaning that you have to gather that data and transfer it somewhere for permanent storage before the cleanup job runs.

Performance

Microsoft published on MSDN a SQL Server Best Practices Article on “Tuning the Performance of Change Data Capture in SQL Server 2008” going in length on why, how and where to tune the performance of change data capture. The general conclusion of this article and many articles on the web is that the Change Data Capture has very little overhead and can be used comfortably on OLTP database with no fear of grinding your database server to a stand still. You will see a slight decrease in performance through the increased disk writes from the change data capture job but the decrease in performance if the capture job is properly configured should not be more than 5% by my test. You will see an increase in the disk queue length for the log file but only a slight for the data disk.

Problems

There are of course problems with the SQL change data tracking. The three main problems that I have faced in working with change data tracking are storing contextual information, gathering change data in one place, schema altering.

Contextual information

Storing contextual information about the change is a pain with change data capture. In most cases when performing audit you want to store some contextual data about the current operation which is changing the data like the userid, data, reason of change (form, command, etc.) …. Since the change data capture relies on an asynchronous read of the transaction log, in the time of the read SQL server does not have any information about the context under which the change has occurred and therefore that data can not be gathered.

Gathering change data in one place

The change data capture as is designed and implemented (as I can figure it) is intended to capture data for temporary storage before you transfer it to data warehouse for permanent storage. The data warehouse is intended to have the same schema (no primary or foreign keys) like the OLTP database and you simply copy the rows in the data warehouse. There is no built in way as it was not intended for such use to gather the data all in one place for review.

Schema altering

This brings us to the last problem of altering the schema. If you alter the schema of your OLTP db, you have to change the schema for your data warehouse db. That is no problem when adding columns but what when you delete the columns, and how to then display the data for the user? There is one more thing I think is just pure laziness that Microsoft did not implement. The problem with schema changing is that when you change the schema the change data capture continues to track the same columns (as well as there data type) as before with no input to you that you should do something to start capturing the new columns. It would suffice that when changing the table under change data capture you would get an error that you can not do that until you drop the change data capture for that table, but it seams that was to much for Microsoft.

There are workaround for all of this problems and I will show them in the next article on change data tracking.

Conclusion

SQL server 2008 change data tracking is a well welcome addition to SQL server feature list which helps us to create audits for your databases easily with little overhead. With some missing features which have simple workarounds this feature can be used very effectively for auditing your database.

Check back for more articles on change data tracking with the details of implementation and workarounds need to make the whole thing functional.

Written by Luka Ferlež

June 20, 2009 at 18:34

Two faces of cloud computing

leave a comment »

There is a big hype about cloud computing these days, all major web companies are investing heaps of cash into cloud computing. The big 4 (Microsoft, Google, Yahoo, Amazon) and a whole bunch of fresh startups are playing on the cloud computing card as the next big thing in the world of IT. They are all expecting that in the next year more and more web application and enterprises will migrate to the cloud computing platforms. There is a lot of predictions for cloud computing in 2009 like this one from marketwire.com:

A rise in serverless companies with 1000+ employees. In 2009, the market will start to hear about more and more companies going completely server-less. While this is already happening at smaller companies, larger and larger companies will optimize their business processes and cut IT expenses by outsourcing to cloud providers.

In the upcoming year more and more companies will have a focus on reducing costs because of the economical breakdown in 2008. The first enterprises that will turn to cloud computing platforms as their primary provider of IT infrastructure will be the IT companies. Why wouldn’t they go for cloud computing when it offers savings on the costs of IT infrastructure and personnel required to maintain that infrastructure. If you look at the prices offered by the big 4 in cloud computing are setting for the use of their infrastructure you will see massive savings. Enough said is that Flickr host all of its photos on the Amazon S3 storage infrastructure to show the capabilities of cloud computing platforms. All that for just a fraction of the costs of owning, running and maintaining company owned servers or datacenters.

Is it all that great like it is presented? Well not so when you dig a little bit under the surface. What is the problem then? The problem is with the data and applications that you put in the cloud, there are issues with security, ownership and access to the data. In the security I don’t mean if your data is secure from the outside rather from the inside from the company which is providing the infrastructure. You must be aware that personnel from the cloud provider company have access to your data stored in the cloud. Therefore you should make sure that data is encrypted when stored in the cloud.

The problem with data in the cloud can be summed up in this excerpt from the Amazon web service Customer Agreement.

3.7.3. In the Event of Other Suspension or Termination. Except as provided in Sections 3.7.1 and 3.7.2 above, we shall have no obligation to continue to store your data during any period of suspension or termination or to permit you to retrieve the same.

Now what this means is that in the case of contract suspension or termination Amazon has no obligation to keep or provide you access to your own data, which basically translates to you depending on the good will of people working at Amazon to give you access to your data. What happens if your cloud storage provider goes out of business, what happens to your data?

In a recent article Richard Stallman the GNU founder has expressed similar concerns claming saying that cloud computing is:

It’s stupidity. It’s worse than stupidity: it’s a marketing hype campaign,

Written by Luka Ferlež

January 1, 2009 at 15:32

Software engineering is bad engineering

leave a comment »

This a common phrase that is used in reference to software in general. I have heard it from my friends, customers and even co-workers all of which are often ignorant of the complexity and the amount of different factors that are involved in the execution of a software application. I’m especially amassed when my programing colleagues criticize some software with no objective reason and fully knowing that their own software is as equally flawed. The very same question was raised on my new favorite programing Q&A site stackoverflow.

Bugs in software is something that we have to be aware will happen as long as there is software. Now most people that are not familiar with programing always ask why software can’t be bug free? That’s when I ask them if they have experienced something of the following:

  • Oil leak
  • Engine trouble
  • Leaky faucet
  • Wobbly chair
  • Squeaky doors
  • VCR eating the tape

Off course the answer is yes, we have all experienced some of this if not all and this are the equivalent of software bugs. Oil is not meant to leak from your car – bug, chairs are not meant to be wobbly – bug. In short my answer is no we can’t have bug free software because nothing around us is bug free even not ourselves we can only try to keep bugs down to the minimum.

You are not meant to be sick but you still are is that a bug in you immune system or was it caused by something else out of your body’s control? Strange question you might think but what it comes down to is that our body starts to malfunction because something that is out of the body’s control has influenced it and disrupted the normal way of functioning same apply’s to software.

Software developers constantly combat the external influences on their applications be it human input, operating environment, incompatible software or whatever other external source of influence. This all invites so many variables and possibilities in an already highly complex system that is being developed by humans and is thus by default faulty. When developing software you have to take into account different hardware and software combinations on which your software is supposed to run, users that are supposed to use it and when you factor all of that in you get massive number of possibilities. Now as I see it most applications today are quite stable for what & where they are meant to run.

Software engineers are trying to combat all of those possibilities and the bugs that arise from such a number of different scenarios and so far we have seen with each new release of software that they are managing to decrease the number of bugs rapidly and are producing really quality code.

Anybody remember Windows 95? 🙂

Written by Luka Ferlež

December 14, 2008 at 1:03

Posted in Programing

Tagged with

Microsoft Windows release cycle

leave a comment »

First of all be for you start reading I’m not an employee of Microsoft and this text represents my opinions and is not representative of official Microsoft policy.

Yesterday I have been at a Microsoft where several of us have among other seen the M2 build of Windows 7 Client. Now I’m not going to write about what we have seen, Windows 7 features and etc. since I’m not at liberty to do so because of an non disclosure agreement. But there was an interesting discussion on the new Microsoft release cycle for Windows Client operating system. Microsoft has set that after the release of Vista a new release cycle for Windows OS to be 2-3 years cycle release to release. Latest information from Microsoft that can be found on the Internet say that the release of Windows 7 client will be on track with that release cycle with the commercial release date somewhere in fall/winter 2009 or beginning of 2010, internal information from Microsoft have the target of the first availability of RTM set somewhere in summer of 2009.

Microsoft has decide that they will no longer do large changes from release to release like they have done when moving from Windows XP to Windows Vista, rather they will the releases of Windows will be developed in a somewhat incremental fashion. The release will be far from service packs, but we can expect less service packs (Windows XP had 3), rather just one service pack for each version with the major release following soon after. This is a simply due to the requirements of the consumer market that requires a new version of a product with new shiny glittering features that every one will want to have, something like what Apple is doing with MacOS. Now this speed of releasing new OS versions is something that is a requirement of the consumer brings problems to the enterprise market that certainly does not have the need and resources to switch to a new versions of an OS every 3 years or so.

This was the discussion that arose how will the enterprise market react to this change in OS releases, since we who work with enterprise user are aware that our users are sometimes still using Windows 98 and Windows 2000 operating system and most of them are using Windows XP and have no will/need to upgrade to either Windows Vista or the upcoming Windows 7. Microsoft should considered this when they have decided to shift to the new development and release cycle. The way I see the situation now is that they have not considered this or that they have and deiced that their enterprise user will be able follow the new cycle, or that they will skip a release and then wait for the next one.

There is one more problem with the faster release cycle and for the enterprise user it is not a small one when it comes to the enterprise user. Unlike the average Joe consumer the enterprise customer emphasizes ROI and the resource that the company needs to spend on upgrading their software and possibly hardware infrastructure which also accompanied by the expense of training the user and the traditional drop in productivity until the users get accustomed to the new OS. This poses a big risk and big investment to the enterprise user too big in my opinion for them to take the change likely. Another change that the enterprise user will not take lithely is that with the faster release cycle the support periods for the products on the fast track will be shorter which will force the enterprise to change the OS and etc because support is for the enterprise user what the shiny user is for the consumer.

So for the enterprise user from now on they have these two options:

  1. Take the risk and upgrade every 3 years
  2. Upgrade every 6 years with no updates in the mean time
  3. Hope that Microsoft will change their policy

Maybe there has come a time to split the development of Windows into 2 separate tracks, one for the consumer market and one for the enterprise market.

Windows Consumer track

This track should cover the Windows for the average Joe which would be released on 2 – 3 year basis and would include all the latest glittering features that we all like including all the multimedia, game and etc. features that a consumer needs. This edition of Windows would be concentrated on the showstopper features and would be limited to only a 5-6 year support period, since the users will be more likely to change to OS in the meantime be it with new hardware or simply a change of software on the same hardware.

  1. Windows Client
  2. Windows Home Server

Windows Enterprise track

This track should cover the Windows for the large enterprise customers which would be released on 5 – 6 year basis and would include the latest security and stability features and also increased testing and compatibility checking that is extremely important to the enterprise customer. This edition of Windows would be concentrated to be able to support a longer usage life span and would be on a 10 year support period, since the users will be unlikely to change to OS in the meantime as the cost of doing so are high and the change it self does not bring enough added value to the enterprise customer.

    1. Windows Enterprise Client
    2. Windows Server

So one day we might finally wake up in a world where enterprise and consumer markets don’t mix with support and release dates but will share the minimal common features that are need to reduce the development and testing costs for each of the OS versions.

Written by Luka Ferlež

October 8, 2008 at 22:50

Posted in Computers, Windows

Tagged with , ,

Using TransactionScope for handling transactions

with 5 comments

Implementing transactions from C#/VB code today might seam easier than ever before, but the path to easier and effective transaction handling has many traps on the way. Microsoft made the life of developers seemingly easier by implementing the TransactionScope class in the .net framework 2.0. I say seemingly easier because you sometimes might find your self baffled by the way your code will behave.

The TransactionScope class uses MS Distributed Transaction Coordinator to manage transactions, so before you start using the TransactionScope, make sure you have DTC running on the Web and Database server and that you have granted enough security permission for the DTC to be able to communicate on thoes machines.

So to get started with TransactionScope, here is a simple code example that seams perfectly OK.

using(TransactionScope outerScope = new TransactionScope())
{
    // Execute query 1

    using(TransactionScope innerScope = new TransactionScope())
    {
        try
        {
            // Execute query 2
        }
        catch (Exception)
        {
        }

        innerScope.Complete();
    }

    outerScope.Complete();
}

Now if query 2 which is inside of the try/catch block, rasies an error you will catch the exception in the try/catch block and handle it, but here is a supprise the application will throw an ObjectDisposedException on line 15 when you try to complete the transaction. This is because the DTC has already caught the exception and although you have handled it, the TransactionScope objects where already disposed by the .Net code and the transaction has been rolled back. Notice that I have said “objects” this is because both of the TransactionScope object have been disposed as they are a part of the same transaction. This is just one example where you might seam surprised that the code is not behaving as you would expect.

The TransactionScope implements something that is referred to as an “Ambient transaction”. This is the transaction in which all database queries are executed. When I say all queries I mean all queries that are executed between the creation of the TransactionScope object and the calling of the Complete() method, including all method calls and their method calls and so on, that are executed between those two lines. That means that you don’t need to handle the transaction in every method that participates in the transaction unless you want to.

You can control the manner in which a method participates in a transaction quite easily. Here is a piece of code to demonstrate:

using(TransactionScope outerScope = new TransactionScope())
{
    // Execute query 1

    using(TransactionScope innerScope =
          new TransactionScope(TransactionScopeOption.Required))
    {
        // Execute query 2
        innerScope.Complete();
    }

    using (TransactionScope innerScope =
           new TransactionScope(TransactionScopeOption.RequiresNew))
    {
        // Execute query 3
        innerScope.Complete();
    }

    using (TransactionScope innerScope =
           new TransactionScope(TransactionScopeOption.Suppress))
    {
        // Execute query 4
        innerScope.Complete();
    }

    outerScope.Complete();
}

This demonstrates the use of the TransactionScopeOption Enumeration to control the way some piece of code will participate in a transaction. The enumeration consist of three options Required, RequiresNew and Suppress. The default options for the TransactionScope constructor is “Required” which states that an ambient transaction is required for this operations, if non exists the transaction will be created. The options “Suppress” states thata the this TransactionScope will not participate in the transaction and therefore all database queries are executed without a transaction. The option of “RequiresNew” states that a new ambient transaction will allways be created and become the root transaction.

There are further options that can be modified when using the TransactionScope class by using a different constructor TransactionScope Constructor (TransactionScopeOption, TransactionOptions) that takes the TransactionOptions struct as a parameter here is an code example.

TransactionOptions options = new TransactionOptions();
options.Timeout = TimeSpan.FromMinutes(1);
options.IsolationLevel = IsolationLevel.Serializable;

using(TransactionScope scope =
      new TransactionScope(TransactionScopeOption.Required, options))
{
    scope.Complete();
}

The TransactionOptions struct has 2 properties Timeout and IsolationLevel. The first is self explanatory the timeout prior to completition of the transaction. The second one is more important, I would say my self crucial to effective handling of transactions. The IsolationLevel Enumeration consist of several levels of isolation: Serializable, RepeatableRead, ReadCommitted, ReadUncommitted, Snapshot, Chaos and Unspecified. With the “Serializable” isolation level being the strictest and “ReadUncommitted” begin the least strict form of isolation.

Serializable means that all transactions occur in a completely isolated fashion; i.e., as if all transactions in the system had executed serially, one after the other. That means no other transaction can be run until the one running is completed. This type of isolation should be used very carefully since it can cause massive timeouts etc. if used with long lasting transactions.

RepeatableRead is a step down from Serializable isolation level, where multiple transaction can run at the same time, but all data read during the transaction can not be modified (exclusive row/page locks) and volatile data cannot be read.

ReadCommitted is a further step down in isolation where the transaction allows for reading and modifying of data that has been used in a select statement, but volatile data cannot be read or changed by other queries.

ReadUncommitted is the least strict isolation level where the transactions can see uncommitted changes performed by other transactions, allowing for dirty reads.

More on isolation levels on Wikipedia.

That sums up the basic use of the TransactionScope class with more to come 🙂

Written by Luka Ferlež

August 5, 2008 at 17:04

Putting C# into SQL

leave a comment »

In my latest project we have come across a need to get results from a web service inside SQL Server. First basic reason why we wanted to have such an integration of web services is so that we could use them transparently in Microsoft Reporting Services. That of course is not the only reason we also wanted to use the same results as an input to some of our stored procedures and to do that we need to be able to get the results from the web service through a stored procedure or a table valued function.

The process it self is quite simple if you know where to begin. The beginning is in Sql servers implementation of CLR, with the use of CLR you can easily show those DBA’s how real programing is done :). Unfortunately for the DBA’s this is the only way this can be done so they need to trust us “developers” :). Seriously this is not such a problem to implement.

Pop open up Visual Studio and create a new “Sql Server Project” in the language of your preference (VB or C#) it’s all the same, but I will use C#. To start us off we need to configure the project first, so open “Project Properties” go to the “Build” section and find the option “Generate serialization assembly” set the option from auto to on. You have to do this because Sql server clr does not allow dynamic xml serialization, I guess it a security thing. Now when you build your project you will end up with two dlls one for you code “project.dll” and the other one for the serialized xml “project.serialization.dll”.

Now add your web service to the project just like in any other project trough the “Add web reference” menu, after you have done sou let’s add a new item to the project, add a stored procedure item to the project. Here is some sample code for the stored procedure.

[Microsoft.SqlServer.Server.SqlProcedure]
public static void WebServiceTest()
{
	// Initialize web service, adjust it for your service
	using (WebService ws = new WebService())
	{
		// Create a template record for the result set
		SqlDataRecord record = new SqlDataRecord(
			// The strings are the names of the columns of the result set
			// first field of type int
			metaData[0] = new SqlMetaData("field1", SqlDbType.Int),
			// second field of type varchar, equivalent of varchar(max)
			metaData[2] = new SqlMetaData("field2", SqlDbType.VarChar, -1),
			// third field of type decimal equivalent of decimal(12,4)
			metaData[4] = new SqlMetaData("field3", SqlDbType.Decimal, 12, 4));

		// Mark the begining of the result-set.
		SqlContext.Pipe.SendResultsStart(record);

		// Send rows back to the client.
		// Get the results from the web service, query a method on your web service
		foreach (Result item in ws.GetResult())
		{
			// set the value for field1
			record.SetInt32("field1", item.Value1);
			// set the value for field2
			record.SetString("field2", item.Value2);
			// set the value for field3
			record.SetDecimal("field3", item.Value3);

			// Send the row back to the client.
			SqlContext.Pipe.SendResultsRow(record);
		}

		// Mark the end of the result-set.
		SqlContext.Pipe.SendResultsEnd();
	}
}

Let’s look at this code. On line 4 is the initialization of the web service you will have to adjust it to your web service. Then you have to initialize a template record for your result set as is done on line 8 trough line 15. We initialize there columns of your result set column one is named “field1” and has a type of int, column two is named “field2” and has a type of varchar with length -1 or varchar(max), column three is named “field3” and has a type of decimal with length 12 and precision 4 or decimal(12,4). On line 18 we start the return of our result set and tell sql that we will be returning items based on record template we have just created. On lines 22 through 33 we iterate through our result set, set the values and use the Sql pipe to send the results. On line 36 we end the sending of results. That is it for the code part, so now we need to register the assembly in the sql server.First you have to set the database trustworthy option to on.

ALTER DATABASE DatabaseName SET TRUSTWORTHY ON

If we do not set this option to on we can’t register our assemblies as UNSAFE, and if we cant do that Sql clr wont allow us to connect to the web service. I think that you could use the EXTERNAL_ACCESS option but I have not tried that.

CREATE ASSEMBLY [AssemblyName]
FROM'C:\AssemblyPath\AssemblyName.dll'
WITH PERMISSION_SET = UNSAFE
GO
CREATE ASSEMBLY [AssembyName]
FROM'C:\AssemblyPath\AssemblyName.Serialization.dll'
WITH PERMISSION_SET = UNSAFE
GO
CREATE PROCEDURE [SchemaName].[WebServiceTest]
AS
EXTERNAL NAME [AssemblyName].Namespace.WebServiceTest

That is it now you can test out your new sql clr procedure.

Written by Luka Ferlež

March 22, 2008 at 23:39

Posted in C#, Programing, SQL

Tagged with , ,

Connecting to TFS

leave a comment »

Like so many things in the programing world, establishing a connection to a TF server can be done in a couple of different ways. I have previously specified everything that you need to start developing an application that will interface with TF server, so I will skip the introduction to TFS development and get my hands dirty right away.

From you code, C# code that is, you can establish a connection to TF server in two basic ways. Both of two create an instance of a TeamFoundationServer class. One is trough a service class TeamFoundationServerFactory and the other is creating an instance of a TeamFoundationServer class manually through a public constructor.

The main difference between the two is that the TeamFoundationServerFactory static class will cache the connected servers, so that every time you request a server, a request is not being sent to the TF server, thus not clogging the server with requests.

TeamFoundationServer server = TeamFoundationServerFactory.GetServer("http://" + serverName + ":" + serverPort);

Now you have your instance of a TeamFoundationServer. Each time you call the GetServer(String) method it will check if the server has been already accessed, and if it was the method will return a cached instance otherwise it will create a new one.You other option is to create an instance of a TeamFoundationServer on your own.

TeamFoundationServer server = new TeamFoundationServer("http://" + serverName + ":" + serverPort);

If you it this way a request will be sent to the TF server to check if it exists, if you do that every time you need something from the server, you will generate a lot of request to the server.Basically if you use the TeamFoundationServerFactory class the framework will handle the caching of the server for you, if you create an instance on your own you will have to store that instance and keep track of it your self.

Authentication

The TeamFoundationServer constructor way has one benefit over the GetServer(String) method. The benefit is seen when you get around to authenticating the users. When you want to provide user authentication through the GetServer method, you will use the overload that has looks like GetServer(String, ICredentialsProvider)
using the UICredentialsProvider class like so:

TeamFoundationServer server = TeamFoundationServerFactory.GetServer("http://" + serverName + ":" + serverPort, new UICredentialsProvider());

server.EnsureAuthenticated();

What this will do, it will create a pop up requesting the user credentials every time the user needs to be re-authenticated. The EnsureAuthenticated() method will first check if the user has already been authenticated, and if not it will create a login pop up for user credentials then authenticating the user.In your software you might want to save the user credentials to use them any time the user needs to authenticate. That’s when TeamFoundationServer Constructor (String, ICredentials, ICredentialsProvider) constructor comes into play. With this constructor you can specify in addition to a UICredentialsProvider object that is used as a fall back if the authentication fails, separate NetworkCredential object to be used for authentication.

NetworkCredential networkCredentials = new NetworkCredential(
   userName,
   password,
   domain);

TeamFoundationServer server = new TeamFoundationServer(
   "http://" + serverName + ":" + serverPort,
   networkCredentials,
   new UICredentialsProvider());

server.EnsureAuthenticated();

This way the code will try to authenticate first with the selected NetworkCredentials and if that fails it will use the UICredentialsProvider to create the pop up and ask the user for credentials. This way you can save and reuse the users credentials when you want to, and if the authentication fails the framework will ask the user for credentials on its own.

Proxy

I was having trouble connection to my company’s TF server, since our workstations are in one domain and the TF server is in a separate domain, with an ISA and a Web filtering server in between. I was getting the HTTP error code 407 “proxy authentication failed”, until I found a way to deal with it I lost around 4 hours. The solution is quite simple you just add this to to your config file.

<system.net>
 <defaultProxy enabled="true" useDefaultCredentials="true">
  <bypasslist>
   <add address="serverName" />
  </bypasslist>
 </defaultProxy>
</system.net>

When you put this in the application will by pass the proxy for the specified network addresses.

Written by Luka Ferlež

March 18, 2008 at 21:35