Why this comparison?
Recently while working on a (relatively) huge cloud migration project, I had the opportunity to brush up on my SSIS knowledge that was getting ancient. I had to work on several SSIS projects and pave the way for my fellow developers so we could all move faster.
When working on SSIS projects, often times there is need to write logic that cannot be achieved by using out-of-the-box components from the toolbox. In those cases the first option that automatically comes to mind is Script Component. This component lets you write code in your language of choice to act as a source, destination or transformation. You always also have the option of making your own shiny custom component. It is generally believed that making a custom component is usually more complicated and it takes more time so most of the times developers use Script Component throughout the project and it leads to duplicating code that is not maintainable and every change is hard to track to a point that in larger projects no one is really sure how each part works anymore 🙂
What’s wrong with Script Component
Script Component looks pretty good in theory. It gives you the possibility to embed code in a data flow. It can act as a source, destination or a transformation. The problem is in the way the code is embedded in the package. If you open the XML content of a package that has a Script Component is it, you will notice that it is actually a full Visual Studio project with a random generated namespace and some generated code to make it easy to interact with fields through typed properties. When you double click on a Script Component, SSDT decodes that projects and (presumably) writes all the files of that project in a temporary folder and opens it in a new instance of Visual Studio. When you close that project after applying your changes, it has do do the reverse. If you put a break-point somewhere in that project, then again it has to do all that again. So, to sum it up:
- Script Components take large space in SSIS packages and if you create several Script Components in a package, it can dramatically increase the size of the package hence slowing down development and stability of Visual Studio and hinder productivity
- Each Script Component is an independent project that makes debugging them harder specially that they are encoded and embedded within the package’s XML.
- The fact that all Script Components are encoded and embedded within the XML file will make it really hard to do SSIS development in a team and it is very difficult to trace a change no matter which source control software you use. Imagine comparing thousands of lines of encoded XML to see where some logic might have been changed in some lines of scripts.
- The debugger in Visual Studio might not work properly (even crash) when you want to debug multiple Script Components. It is also not possible to have breakpoints in more than one Script Component
- Readability is another big issue. You will not be able to easily skim through your Script Components to understand the logic. Imagine you will need to open each one by one slowly and if you miss something, you will need to back again to a previous one. It is not like a typical Visual Studio project where you can easily navigate through code with several out-of-the-box features.
- The name of the columns you get in the Script Component might not be exactly the same as original columns, if you decide to use generated typed properties.
- It is not possible to reuse the logic that is inside a script component, unless if you externalize it in another assembly and in that case managing versions of that external assembly can become tedious, because you might need to open every single Script Component that references that assembly should the version change and if you by mistake do a breaking change you will not find out during compile time. You will need to test every single script component using that assembly and there is no easy way to find out which Script is referencing that. so, in short you will need a full end to end test to find an issue that you could be found in a compile in other types of projects.
- It is not easy to duplicate a Script Component. I know that it is against DRY rule to duplicate code, but lets say due to above limitation in special scenario you decide to do so, in that case because the assembly name in those Script Components will be the same, during compile time one of will replace the other without SSDT giving you an error or even a warning.
- This one is actually a bug. If you use more recent versions of C# language (I don’t know for VB, but probably the same) none of your break point will ever be hit. So, it is better to forget about those shiny features that came with C# 6.0+ like simplified dictionary initializer (i.e.
{ ["key"] = value }
).
What can justify using a Script Component?
In my opinion with all the above problems around Script Components, the only reason you should use it is when you need to create a one off component as part of your data flow or when your are going to have just a few minimal Script Components in your package. In all other cases that I have experienced it is better to create a custom component. You might be thinking that building a custom component can be difficult or you might have already tried to find your way around MSDN to figure out how to do it, but got confused with the minimal explanation and over simplified examples that exists. But, trust me on this, it is not that difficult if you do it right. In my next post, I’m going to show you just that, but with a clear step by step guide and a real world example.
Leave a Reply