Simplify

Keeping things simple

Bazaar explorer “Activation context generation failed”

leave a comment »

After installing Bazaar on Windows server 2003 you might get an error while launching Bazaar Explorer stating something like this

Activation context generation failed for “C:\Program Files (x86)\Bazaar\lib\win32api.pyd”.Error in manifest or policy file “C:\Program Files (x86)\Bazaar\lib\Microsoft.VC90.CRT.MANIFEST” on line 12. The value “..\msvcr90.dll” of attribute “name” in element “urn:schemas-microsoft-com:asm.v1^file” is invalid.

While the message it self is cryptic the solution is quite simple, install the right version the C++ redist and the problem goes away.

 

Happy bazaaring

 

Written by Luka Ferlež

March 1, 2013 at 14:58

TFS 2012, TF30170: State ‘In Progress’ does not exist

leave a comment »

This wired error was throw by the TFS 2012 project creation wizard while atempting to create the first new project since applying the TFS 2012 Update 1 using Scrum 2.1 template. The full error text goes like this.

Error TF30170: The plugin Microsoft.ProjectCreationWizard.WorkItemTracking failed during task ProcessConfiguration from group WorkItemTracking. Explanation Plugin error text: “The following element contains an error: TaskWorkItems/States. TF400587: This element defines the states for work items that appear as tasks on your sprint backlog. Each state must exist in at least one of the work item types belong to category defined in: TaskWorkItems. The following states do not exist in any of the work item types: In Progress.”

The same error message has been known to appear on non Update 1 versions and is even reported as fixed by Microsoft on Connect and it seams it is not fixed, though the solution provided by Microsoft support does fix the problem. The bottom line is that the value in the TFS database for your collection is different that the one in the process template. In our case it appears this is an issue only with the Scrum 2.1 template so the following steps are for that template although all of the templates can be fixed in the same way

First of we must get the value of in the TFS database

tfs_db_select

In this query of course you have to replace the [Tfs_DefaultCollection] database name with your own tfs database for the collection in which your are trying to create the new team project. If the project creation wizard reports problem with some other state than you should replace the “In Progress” part with the state in question.

The query should produce something like this and here you can see the correct spelling of the constant in question.

tfs_db_result

Next up we have to download the process template from the process template manager to check the states defined for the process. So fire up VS 2012 and pop open the process template manager like so

prc_temp_mng

then select, download the process template and hopefully save somewhere you remember like so

prc_temp_mng_download

After downloading the process template navigate to the commonconfiguration.xml

prc_temp_com

Locate all of the instances of the “In Progress” string, you should note here that the Process template manager validation process is case sensitive while SQL is not (by default) and the problem is that the string in the database is “In progress” and in the config file is “In Progress”.

com_conf

This seams to be a no-no, therefore change the casing in the config file to match the one in the database to fix your problem, save the file and upload it back to TFS trough the Process template manager.

Be on your merry way creating new team projects as much as your heart desires.

Written by Luka Ferlež

February 27, 2013 at 18:23

Estimating with planning poker

leave a comment »

Planning with poker, you say? That sounds as reasonable as planning a construction project by drawing plans on napkins. Unreasonable as it may sound planning poker works, but not as you would think. Planning in planning poker is just a by-product of the result of the poker planning session, and that is accurate estimates, not precise but accurate.

Planning Poker, is a consensus-based technique for estimating, mostly used to estimate effort or relative size of user stories in software development

Planning poker will provide you with estimates from the development team and from that estimates the project manager can with some dose of certainty put the estimate in the context of a project plan. Accurate estimates with which the development team is comfortable empower the project manager to create a executable project plan and to take required action to see out the user stories on schedule.

Common situation on any project is the clients demands delivery of “x” user stories on the fixed date “y”. With accurate estimations of user stories which can be depended on by the project manager, the project manager can outright see weather the required user stories can be delivered in the available sprints. With that information the project manager can negotiate with the client on reducing the scope, can add new team members, outright reject the clients request or do any of the actions in the project managers arsenal.

What’s the advantage of using planning poker over other estimations, you ask?

So how does the estimation go?

The estimation meeting should happen on a regular schedule, but you should adjust your meetings according to your needs. User stories should be estimated when they appear and re-estimated as new findings and information are known. The project manager should nominate the user stories for estimation by the team in the estimation meeting.

Estimating is done with card decks that have different values usually based on the Fibonacci sequence, the most common being 0, ½, 1, 2, 3, 5, 8, 13, 20, 40, 100, ? (unsure). This sequence reflects the inherent uncertainty in estimating larger items this ensuring that the estimation precision is correlated to the estimation value. This is very important because if you estimate one user story at 50.5 and an other at 47.65 that is very precise and probably wrong and very irrelevant to you needs. More important what is the quantifiable difference between 50.5 and 47.65 based on a 5 sentence description of the user story?

The meeting it self goes on in a few basic steps

  1. Product owner provides a short overview of the user story
  2. The team ask questions and discuss to clarify assumptions and risks
  3. Product owner records the summary of discussion to improve the user story definition
  4. Each individual lays a card face down representing their estimate
  5. The team turns card simultaneously
  6. People with high estimates and low estimates are given a soap box to offer their justification for their estimate.
  7. Discussion continues from step 2 to step 6 until a consensus is reached

The procedure continues until all of the user stories are estimated and the team is happy with the estimations. Then the product owners can re-prioritize the backlog and with them the project manager can re-adjust the overall project plan.

The estimation meeting must adhere to some rules

  1. During discussion, numbers and other forms of size (long, short, much, less, etc.) must not be mentioned at all in relation to feature size
  2. An egg timer is used to ensure that discussion is structured
  3. The developer who was likely to own the deliverable has a large portion of the “consensus vote”
  4. No more than 10 people are involved
  5. Moderator, product owner or project manager are not allowed to estimate.
  6. Units used vary – they can be days duration, ideal days or story points, but they must be announced by the moderator on the start of the meeting
  7. Each estimator is given one deck of the cards
  8. All decks have identical sets of cards in the
  9. Moderator can negotiate a consensus

Rules 1 and 2 are the basis of success of an estimation meeting, team members must not anchor to someones estimates and you wouldn’t want the discussion on user story to go on for hours, would you? The egg timer forces the team to re-cast their estimations based on the current state of the discussion to see how close is the consensus. If the consensus can’t be reached after a couple of iterations than the moderator can and should suggest to postpone the estimation of the user story for revisit on some other meeting or if that is not possible and an estimation must be made than the moderator should take the vote of the majority.

The planning poker estimates might not be the most precise estimates but the delivered estimates are dependable and a good basis for building an iteration plan to discus with your client.

Written by Luka Ferlež

February 4, 2013 at 22:29

How to configure MS DTC through a firewall

leave a comment »

If you are using TransactionScope for handling your transactions then the ambient transactions you open with the TransactionScope command will be inlisted in the Distributed transaction coordinator (DTC). For the transaction to run the DTC (Microsoft or otherwise) must exist on all machines that will participate in the transaction and the participants must be able to talk to each other through the firewall.

The DTC requires several things to be able to comunicate:

  • being able to resolve names by DNS or NetBios (from both sides)
  • being able to communicate trough port 135 (RPC Endpoint Mapper port for handshake)
  • being able to dynamicly assign at least one port for communication (by default in the 1024 – 65535 range)
  • RPC & DTC must exist on all particiants

First ensure that all of the participants can resolve their respective names by using the ping command, if they can’t then add entreies to LMHOSTS file to secesfully resolve names.

The comunication of the participants begins with the originating DTC negotiating with the destitation DTC trough port 135 authentication and on which port the actual exchange of transaction data will occur. The port will randomly be assinged in the 1024-65535 range. That means that the firewall must allow all of thoes ports in both directions for the DTC to work proprely which of course blows the heads off of the IT security guys.

Lucky for the IT security guys you can control the port range for dynamic assigment trough a few registry values. The settings are controled by the Ports, PortsInternetAvailable, UseInternetPorts values under the “HKEY_LOCAL_MACHINE\Software\Microsoft\Rpc” path. When limiting the port range Microsoft recomends to use ports 5000 and above and assing a minimum of 20 ports. However you should assing a minimum of 100 ports since the RPC will be used for other process as well as for the DTC.

So for example to limit the port range to ports 20000-21000 do the following:

  • Add key Name:”Internet”
  • Add value Name:”Ports”, Type:”REG_MULTI_SZ” (Multistring), Data:”20000-21000″
  • Add value Name:”PortsInternetAvailable”, Type:”REG_SZ” (String), Data:”Y”
  • Add value Name:”UseInternetPorts”, Type:”REG_SZ” (String), Data:”Y”

Then all you have to do is adjust your firewall setting to match the one above.

  • Allow inbound & outbound to port 135
  • Allow inbound & outbound to port range 20000-21000

That’s it you should be able to run your distributed transactions and if you need more detailed instructions you can find them in KB 250367

Written by Luka Ferlež

August 23, 2012 at 13:03

Posted in Programing

Tagged with , ,

Building a project documentation framework

leave a comment »

Properly maintained documentation is of crucial importance to any project team and project manager regardless of project type and methodology used. Agile methodologies of today mostly advocate little to no documentation which might seem as a good idea, but in general it is a bad idea because sooner or later you will have to explain why has something been done this or that way, who order and authorized the implementation of functionality x in this specific way, the developers and tester will need to validate that requirements have been met and so on.

So when you get a supreme guru justice league developer or project manger telling you that he can complete a project without documentation don’t believe for a second that it’s a good idea. While the statement it self might be true and one can complete a project without proper documentation you can be sure that there is a whole world of pain down the road for you.

Currently I’m managing a mid-large enterprise system, that has several maintenance and upgrade contracts running concurrently which are handled by different development teams on different platforms and technologies and it is crucial for all of our teams that they have current and up-to date documentation on the system. Still we can’t afford to waste valuable man days on mountains of papers and documents so we had to choose  a proper amount of documentation which had to cover our basic needs

  • Easy to write and maintain
  • Clear and unambiguous
  • Understandable to technical and non-technical personnel
  • Easy to relate and transform to other documents

We started of with the basics of the Agile methodologies and the user story format and built our requirements document around it. The current requirements document corresponds to an Epic consisting of a scenario list like “Editing an xx”, “Searching for xx”, “Calculating xx” which consist of a story list table detailing all of the user stories required. The document in it self is a simple decomposition of the user actions which exibits the following traits

  • focuses discussion on user actions and process
  • client stakeholders can easily relate to it
  • it contains sufficient information for the development team
  • it does not predetermine any part of the actual implementation

Of course the document is always signed and confirmed by all parties project manager, client representative and development team  representative.

So we happily started to write our requirements documents and after a while when we got bug reports we saw that we could not identify how the current version of the system should behave since it consisted of user stories implemented over user stories over user stories etc. Then came the decision that we should have a document that will be updated as we implement user stories in the system will describe the current state of the system.

To satisfy that need we used the Use case form as a basis for functional description of the system. The use case document is updated during sprints by the business analysts as user stories are implemented consistently representing the current system functionality. This document covers needs different form the requirements document such as:

  • description of current system functionality
  • detailed description of required steps to complete an action or process
  • detailed description of UI
  • detailed description of implemented business rules
  • detailed description of the implementation of user stories

By developing the use cases from the requirements document we inadvertently managed to kill three flies with one blow. We quickly realized that the use case document since it details implementation and user actions, can easily in mater of an hour or two be transformed into test cases and user manuals. Then from the test cases it was a mater of a couple of hours to build automated UI test which dramatically increased the testing coverage and quality while reducing the load on our testing team.

With this we saw the ultimate benefit of keeping a decent level of documentation of our system and that is improving the overall quality with reducing costs and time-pressure development teams. The second benefit of this documentation system is that our team members can now focus on their primary job with a clear goal ahead and a clear condition of done.

The business analysts develop User stories with the stakeholders, architects review and design the system development teams concentrate on developing the best software based on User stories, the testing team can develop test cases based on the use cases and automated test based on the test cases, the support team has detailed specifications to review, test and categorize bugs, thus completing the circle and easing the job all round at the expense of investing a little bit of time in developing some documentation which comes back big time as a reward in the form of clarity of functionality, faster development and most important of all quality.

Written by Luka Ferlež

August 5, 2012 at 17:24

Add routes on VPN connect with Powershell and Task Scheduler

with 12 comments

At my company we use a Microsoft ISA server for our firewall/VPN server. To be able to access the servers at my company via VPN required me to do one of two things:

  1. Use default gateway of the remote network
  2. Add static routes each time I connect via VPN

Option number one has the drawback that in such a scenario all my traffic would be directed through the VPN connection. Since my company has a very restrictive security policy which allows me to access only a couple of servers (TFS, SQL server, …). That effectively means that when connected to the VPN I can not use the internet or any other network resource.

Option number two requires that you add each of the routes to the routing table every time you connect the VPN. This can not be done via a batch script since the IP of the gateway changes on each connect.

So I thought to my self that there should be a better way to do this. With some basic Googling I quickly came up with an elegant solution. The first step towards the solution was a piece found on this blog. The blog discribes the very same problem that I was facing and provides a simple Powershell script that handles the routes. This Powershell script although it does what is need efficiently didn’t completely satisfy me.

Why? Well simply because I lazy and don’t like having to click on a power shell script on the desktop every time I make a VPN connection.

Doing some more Googling brought me to a Technet page that described how to use the Windows Task Scheduler to trigger the Powershell script execution on each VPN connect. Modifying the snippet I created this command which is executed in the Powershell command prompt.

schtasks /create /F /TN "VPN Connection Update" /TR "Powershell.exe -NonInteractive -command C:\vpn.ps1" /SC ONEVENT /EC Application /MO " *[System[(Level=4 or Level=0) and (EventID=20225)]] and *[EventData[Data='VPN NAME']]"

In this command change the path to your script, and change the last part where it says VPN NAME to the name of your VPN connection. This will ensure that the Task scheduler executes your script only and only when you connect that specific VPN connection. The complete explanation of the settings in the command can be found on before mentioned Technet page.

So I was almost there with the solution but the script it self did not satisfy me because I had to add multiple route and ensure that the routes are not already existing. I modified the original script to this version.

# ---------------------------------------------------------------------------------------
# Initial version: http://www.webboise.com/windows-powershell-script-for-adding-ip-routes-across-a-vpn/
# by Chris @ 30.9.2008
# ---------------------------------------------------------------------------------------
# Modified version: https://simpleverse.wordpress.com/2010/10/06/add-routes-on-vpn-connect-with-powershell-and-task-scheduler
# by Luka Ferlež @ 6.10.2010
# ---------------------------------------------------------------------------------------
#
#
# Add IP routes across a VPN via a DHCP assigned IP address
#
# Configuration
# ---------------------------------------------------------------------------------------
# Route IP address
$ips = @("10.20.1.0", "10.20.100.0", "10.23.2.0")
# VPN connection IP
$vpnIP = "192.168.90."
# ---------------------------------------------------------------------------------------
#
# Get the IP address of the VPN connection
$vpnip = ipconfig | findstr $vpnIP
# ---------------------------------------------------------------------------------------
#
# If we don't have an IP address on the VPN, error and quit
if (!$vpnip) {
"You do not have an IP address on the VPN"
exit
}
# Trim any leading/trailing whitespace
$vpnip = $vpnip.Trim()
# ---------------------------------------------------------------------------------------
#
# Split the contents of $vpnip in to an array
$vpnip = $vpnip.Split(" ")
# ---------------------------------------------------------------------------------------
#
# Find out the depth of our IP address in the array
$bit = $vpnip.Length - 1
# ---------------------------------------------------------------------------------------
#
# Get out just our IP address on the VPN
$vpnip = $vpnip[$bit]
# ---------------------------------------------------------------------------------------
#
# Delete routes if existing
foreach($ip in $ips) {
$hasRoute = route print | findstr $ip
if($hasRoute) {
"Deleting route " + $ip
route delete $ip
}
}
# ---------------------------------------------------------------------------------------
#
# Add whatever routes we need
foreach($ip in $ips) {
"Adding route " + $ip
route add $ip MASK 255.255.255.0 $vpnip
}

This script allows you to simply add the necessary routes at the top of the script, and the script will process them, enjoy.

Written by Luka Ferlež

October 6, 2010 at 22:21

SQL 2008 Change Data Capture – The basics

with 4 comments

Continuing on the previous article on SQL 2008 Data Audit – so wittingly called The beginning now we explore the basics of Change Data Capture in SQL server 2008. The first thing we need to do is to enable change data tracking for our database. Unfortunately there is no UI to do this so you have to do it with T-SQL command.

Enabling on database level

USE Test;
GO;
EXECUTE sys.sp_cdc_enable_db;
GO

This line of code will enable your database for change data capture. You enable and disable change data capture with two system stored procedures sys.sp_cdc_enable_db and sys.sp_cdc_disable_db. The changes made to your database you can check by running the next command.

SELECT name, is_cdc_enabled
FROM sys.databases

database

You can see in the picture above that the Test database has been enabled for change data capture. Right now you database has not been changed by SQL server the stored procedure has just updated the database property is_cdc_enabled in the sys.databases to true.

Enabling on table level

USE Test;
GO;
EXECUTE sys.sp_cdc_enable_table
    @source_schema = N'dbo'
    , @source_name = N'Test'
    , @role_name = NULL
GO

This line of code will enable your table for change data capture which you can check by running the next command. You enable and disable change data capture for tables with two system stored procedures sys.sp_cdc_enable_table and sys.sp_cdc_disable_table. When you enable the first table in the database for change data capture SQL server will create several tables, functions and jobs for your database. The changes made to your database should be as so.

SELECT name, is_tracked_by_cdc
FROM sys.tables
database
jobs
tables functions

Here highlighted in red is the table that will actually capture your data, other tables in this picture are supporting tables for data capture and are created only once per database when you enable your first table for change data capture. Check more in detail about tables, stored procedures and functions.

Change data capture table

The data stored in data capture tables consists of some change data capture columns and your table columns.

cdc_table_empty

The columns displayed here are from the change data capture table for our dbo.Test table. We have columns __$start_lsn and __$end_lsn which mark the first log sequence number and the last log sequence number under which the changes were recorded in the transaction log, then there are __$seqval which is the order of the row change in a transaction, the __$operation signals the type of the operation which was executed and last is the __$update_mask is a bit mask showing which columns have been updated in the row change. These columns appear in any change data capture, after that come your table columns with their respective data type. So let’s make a change to our table and see what happens.

Recording change data capture

INSERT INTO dbo.Test
(Col1, Col2)
VALUES(500, 600) 

INSERT INTO dbo.Test
(Col1, Col2)
VALUES(501, 601)

So let’s what has been recorded in our change data capture table cdc.dbo_Test_CT

SELECT *
FROM cdc.dbo_Test_CT

changes

Now let’s perform a few more operations and see what we get.

UPDATE dbo.Test
SET Col1 = 503
WHERE ID = 1 

UPDATE dbo.Test
SET Col1 = 504
WHERE ID = 2 

DELETE dbo.Test
WHERE ID = 1

changes2

Not what you have expected we have executed in total 5 DML statements but the change data capture table contains 7 operations, how come? Let’s make an another query which will make things clearer.

Reading change capture data

SELECT
    CT.__$start_lsn,
    CT.__$operation,
    CASE CT.__$operation
        WHEN 1 THEN 'Delete'
        WHEN 2 THEN 'Insert'
        WHEN 3 THEN 'Update - Pre'
        WHEN 4 THEN 'Update - Post'
    END AS Operation,
    CT.ID,
    CT.Col1,
    CT.Col2,
    LSN.tran_begin_time,
    LSN.tran_end_time,
    LSN.tran_id
FROM
    cdc.dbo_Test_CT AS CT INNER JOIN
    cdc.lsn_time_mapping AS LSN ON CT.__$start_lsn = LSN.start_lsn

changes_transaction

What we have done here is joined the change data capture table for our Test table with the cdc.lsn_time_mapping table which stores information about the log sequence number and time values and allowing us to tie time with the specific log sequence numbers. One more important thing we have here is the tran_id field which we will use in the next article. Now we can se why we have 7 records instead of 5 records which we expected to have. Change data capture when capturing data from the UPDATE statement will create two records one with __$operation value of 3 and one with value of 4. The record with value 3 represents the data that was in the row which was there before the update has been committed and the record with the value 4 represents the data after commit.

One more thing you can see here is the times when the transactions where started and completed. This is very useful when trying to get changes in a specific time frame which can be done by using change data capture functions provided by SQL server.

Reading interval change capture data

The first thing we want to do is to find out which log sequence numbers are related with our time frame we want to look at.

SELECT
    sys.fn_cdc_map_time_to_lsn ('smallest greater than or equal', '2009-06-26 15:37:46.577')
    ,sys.fn_cdc_map_time_to_lsn ('largest less than or equal', GETDATE())

Here we use the sys.fn_cdc_map_time_to_lsn function to retrieve the log sequence numbers for the start time of our interval (smallest where time is grater than or equal to ‘2009-06-26 15:37:46.577’) and the end log sequence number of our interval (largest where time is less than or equal to GETDATE()). Next up we have to use those log sequence numbers to get something useful from the table we can do it by using an another function cdc.fn_cdc_get_all_changes and cdc.fn_cdc_get_net_changes, the difference between the two will be apparent immediately.

SELECT *
FROM cdc.fn_cdc_get_all_changes_dbo_Test (
    0x00000066000000A00004
    , 0x0000006A000001F70001
    , 'all') 

SELECT *
FROM cdc.fn_cdc_get_net_changes_dbo_Test (
    0x00000066000000A00004
    , 0x0000006A000001F70001
    , 'all')

This is the result:

changes_all

changes_net

The cdc.fn_cdc_get_all_changes function returned all the changes that happened in our chosen interval giving us 4 total recorded changes. The second query using cdc.fn_cdc_get_net_changes returned only 2 records in the same interval why so? Well simple all_changes function returns as it names says all changes that fall between our lsn boundaries, but the net_changes returns only the last changes that happened in the interval, but what does that mean? As you can see we have only one record by each row and that is this the last operation executed on that row in the given interval.

One more thing to notice is that when executing this functions records with __$operation = 3 will not appear in the result set although it does fall into our selected interval. Since this functions will only return changes the __$operation = 3 (which is the delete operation of the update columns) will not appear.

Reading changed columns

Useful thing to now is which column has changed in an operation. The thing to use here is an another scalar valued function provided by change data capture sys.fn_cdc_has_column_changed, this function will return 1 or 0 depending if the column has been changed.

SELECT
    *
    , sys.fn_cdc_has_column_changed ('dbo_Test', 'ID', __$update_mask) AS is_id_changed
    , sys.fn_cdc_has_column_changed ('dbo_Test', 'Col1', __$update_mask) AS is_col1_changed
    , sys.fn_cdc_has_column_changed ('dbo_Test', 'Col2', __$update_mask) AS id_col2_changed
FROM
    cdc.fn_cdc_get_all_changes_dbo_Test (
        0x00000066000000A00004
        , 0x0000006A000001F70001
        , 'all')

changes_columns

This way we can see which columns have been changed in each operation and perform actions required according to our business rules. Thing to note here is that when you perform the insert and delete operation (__$operation = 2 || 1) the changed column signaled is all columns because you have either created values for all columns or deleted values for all columns. When looking at the changes for update (__$operation = 4) then you see only the actual updated column signaled.

This concludes this session, till next time….

Written by Luka Ferlež

June 24, 2009 at 16:00

Posted in SQL

Tagged with , ,

SQL 2008 Change Data Capture – The beginning

with 2 comments

Data auditing is a requirement of most enterprise systems and was a pain to implement with SQL server up to now.When data audit was required in pre SQL server 2008 databases you had to relay on solutions like triggers or some custom change tracking on the application layer. With the dawn of SQL server edition 2008 Microsoft has provided with change data capture mechanism which is an integral part of the SQL server 2008 data engine. The change data capture is a powerful mechanism for gathering changes made to data on the SQL server platform and provides it’ functionality with little overhead.

How this works

Well in short simple. Implementation of change data capture is based on an already existing element of the data engine the transaction log and does not require you to alter your db, schema or tables. When you enabled change data capture on a db SQL server will create a new schema called “cdc” with several tables, stored procedures and functions. With the changes made to your db the enabling process also creates two jobs in you SQL agent for your db one for capture and one for cleanup. These two jobs are the base for the asynchronous performance for the change data capture. The capture job is executed by default every 5 seconds, it reads the transaction log for changes from the last job execution and stores them in the change data capture tables. The cleanup job works by default once a day and cleans the stored data in the data capture tables, meaning that you have to gather that data and transfer it somewhere for permanent storage before the cleanup job runs.

Performance

Microsoft published on MSDN a SQL Server Best Practices Article on “Tuning the Performance of Change Data Capture in SQL Server 2008” going in length on why, how and where to tune the performance of change data capture. The general conclusion of this article and many articles on the web is that the Change Data Capture has very little overhead and can be used comfortably on OLTP database with no fear of grinding your database server to a stand still. You will see a slight decrease in performance through the increased disk writes from the change data capture job but the decrease in performance if the capture job is properly configured should not be more than 5% by my test. You will see an increase in the disk queue length for the log file but only a slight for the data disk.

Problems

There are of course problems with the SQL change data tracking. The three main problems that I have faced in working with change data tracking are storing contextual information, gathering change data in one place, schema altering.

Contextual information

Storing contextual information about the change is a pain with change data capture. In most cases when performing audit you want to store some contextual data about the current operation which is changing the data like the userid, data, reason of change (form, command, etc.) …. Since the change data capture relies on an asynchronous read of the transaction log, in the time of the read SQL server does not have any information about the context under which the change has occurred and therefore that data can not be gathered.

Gathering change data in one place

The change data capture as is designed and implemented (as I can figure it) is intended to capture data for temporary storage before you transfer it to data warehouse for permanent storage. The data warehouse is intended to have the same schema (no primary or foreign keys) like the OLTP database and you simply copy the rows in the data warehouse. There is no built in way as it was not intended for such use to gather the data all in one place for review.

Schema altering

This brings us to the last problem of altering the schema. If you alter the schema of your OLTP db, you have to change the schema for your data warehouse db. That is no problem when adding columns but what when you delete the columns, and how to then display the data for the user? There is one more thing I think is just pure laziness that Microsoft did not implement. The problem with schema changing is that when you change the schema the change data capture continues to track the same columns (as well as there data type) as before with no input to you that you should do something to start capturing the new columns. It would suffice that when changing the table under change data capture you would get an error that you can not do that until you drop the change data capture for that table, but it seams that was to much for Microsoft.

There are workaround for all of this problems and I will show them in the next article on change data tracking.

Conclusion

SQL server 2008 change data tracking is a well welcome addition to SQL server feature list which helps us to create audits for your databases easily with little overhead. With some missing features which have simple workarounds this feature can be used very effectively for auditing your database.

Check back for more articles on change data tracking with the details of implementation and workarounds need to make the whole thing functional.

Written by Luka Ferlež

June 20, 2009 at 18:34

Two faces of cloud computing

leave a comment »

There is a big hype about cloud computing these days, all major web companies are investing heaps of cash into cloud computing. The big 4 (Microsoft, Google, Yahoo, Amazon) and a whole bunch of fresh startups are playing on the cloud computing card as the next big thing in the world of IT. They are all expecting that in the next year more and more web application and enterprises will migrate to the cloud computing platforms. There is a lot of predictions for cloud computing in 2009 like this one from marketwire.com:

A rise in serverless companies with 1000+ employees. In 2009, the market will start to hear about more and more companies going completely server-less. While this is already happening at smaller companies, larger and larger companies will optimize their business processes and cut IT expenses by outsourcing to cloud providers.

In the upcoming year more and more companies will have a focus on reducing costs because of the economical breakdown in 2008. The first enterprises that will turn to cloud computing platforms as their primary provider of IT infrastructure will be the IT companies. Why wouldn’t they go for cloud computing when it offers savings on the costs of IT infrastructure and personnel required to maintain that infrastructure. If you look at the prices offered by the big 4 in cloud computing are setting for the use of their infrastructure you will see massive savings. Enough said is that Flickr host all of its photos on the Amazon S3 storage infrastructure to show the capabilities of cloud computing platforms. All that for just a fraction of the costs of owning, running and maintaining company owned servers or datacenters.

Is it all that great like it is presented? Well not so when you dig a little bit under the surface. What is the problem then? The problem is with the data and applications that you put in the cloud, there are issues with security, ownership and access to the data. In the security I don’t mean if your data is secure from the outside rather from the inside from the company which is providing the infrastructure. You must be aware that personnel from the cloud provider company have access to your data stored in the cloud. Therefore you should make sure that data is encrypted when stored in the cloud.

The problem with data in the cloud can be summed up in this excerpt from the Amazon web service Customer Agreement.

3.7.3. In the Event of Other Suspension or Termination. Except as provided in Sections 3.7.1 and 3.7.2 above, we shall have no obligation to continue to store your data during any period of suspension or termination or to permit you to retrieve the same.

Now what this means is that in the case of contract suspension or termination Amazon has no obligation to keep or provide you access to your own data, which basically translates to you depending on the good will of people working at Amazon to give you access to your data. What happens if your cloud storage provider goes out of business, what happens to your data?

In a recent article Richard Stallman the GNU founder has expressed similar concerns claming saying that cloud computing is:

It’s stupidity. It’s worse than stupidity: it’s a marketing hype campaign,

Written by Luka Ferlež

January 1, 2009 at 15:32

Software engineering is bad engineering

leave a comment »

This a common phrase that is used in reference to software in general. I have heard it from my friends, customers and even co-workers all of which are often ignorant of the complexity and the amount of different factors that are involved in the execution of a software application. I’m especially amassed when my programing colleagues criticize some software with no objective reason and fully knowing that their own software is as equally flawed. The very same question was raised on my new favorite programing Q&A site stackoverflow.

Bugs in software is something that we have to be aware will happen as long as there is software. Now most people that are not familiar with programing always ask why software can’t be bug free? That’s when I ask them if they have experienced something of the following:

  • Oil leak
  • Engine trouble
  • Leaky faucet
  • Wobbly chair
  • Squeaky doors
  • VCR eating the tape

Off course the answer is yes, we have all experienced some of this if not all and this are the equivalent of software bugs. Oil is not meant to leak from your car – bug, chairs are not meant to be wobbly – bug. In short my answer is no we can’t have bug free software because nothing around us is bug free even not ourselves we can only try to keep bugs down to the minimum.

You are not meant to be sick but you still are is that a bug in you immune system or was it caused by something else out of your body’s control? Strange question you might think but what it comes down to is that our body starts to malfunction because something that is out of the body’s control has influenced it and disrupted the normal way of functioning same apply’s to software.

Software developers constantly combat the external influences on their applications be it human input, operating environment, incompatible software or whatever other external source of influence. This all invites so many variables and possibilities in an already highly complex system that is being developed by humans and is thus by default faulty. When developing software you have to take into account different hardware and software combinations on which your software is supposed to run, users that are supposed to use it and when you factor all of that in you get massive number of possibilities. Now as I see it most applications today are quite stable for what & where they are meant to run.

Software engineers are trying to combat all of those possibilities and the bugs that arise from such a number of different scenarios and so far we have seen with each new release of software that they are managing to decrease the number of bugs rapidly and are producing really quality code.

Anybody remember Windows 95? 🙂

Written by Luka Ferlež

December 14, 2008 at 1:03

Posted in Programing

Tagged with