Tuesday, October 30, 2007

How big should a function be? Uncle Bob explains.

As I said recently, one of my favorite programming books is Agile Software Development by Robert C. Martin AKA Uncle Bob. The Object Mentor blog that he writes along with others is essential reading. Today he posted a subject very dear to my heart: How big should a function be? Now this topic has been covered many times, not least of all by Steve McConnell in Code Complete, but it's amazing that wherever I work the one constant seems to be huge classes and huge functions and lots and lots of duplicated code. What's with it guys? What's stopping you from creating that new class or breaking that huge function into smaller ones? If you're not sure about how to go about splitting large classes and functions into smaller ones you really should read Martin Fowler's Refactoring.

There you go, I've managed to cram references to all three of my favorite programming books into one rant.

OK, mea-maxima-culpa, I know you don't have to browse far through this blog to see some quite large functions, but I'd like to defend myself by saying that's not how I tend to write production code and when one is explaining a technique or a simplest-thing-that-could-possibly-work it makes more sense to present it as a single block of code. Dig dig, is my hole big enough now?

using Rhino.Mocks

As you might have realized from reading this blog, I'm a big fan of Test Driven Development (TDD). A core part of writing good tests is the use of mock objects. Usually these are instances of objects that implement the the interfaces that the class under test has dependencies on. When I started doing TDD I used to write my own mock objects and it took me a while to realize the power of a good mock object framework. When I first discovered NMock it was a major revelation and took the ease and power of my tests to a new level. NMock has a real limitation though; its use of string literals to describe methods when you set your expectations. Any framework that uses string literals to reference types or members looses most of the advantages of working with a statically typed language.

Recently there's been quite a buzz about a new mocking framework: Rhino Mocks. Its main advantage is that it uses real invocations of the mock object's methods to set up expectations for your tests. Here's a little example:

 

rhinomocks

 

It's a very natural style of writing expectations because they mirror what the class under test should be doing. You can also refactor to your heart's content safe in the knowledge that your tests will pick up any symbol name changes.

The guy who wrote Rhino Mocks, Oren Eini AKA Ayende Rahien is a bit of a phenomena. Subscribing to the feed of his blog is like drinking from the proverbial fire-hose; eight posts last Saturday alone. He's also an active contributor to the Castle Project which I'm very interested in at the moment. Respect!

Monday, October 29, 2007

What is Inversion of Control

Inversion of Control is one of the core principles of object oriented software design. I first came across it about five years ago from reading Robert Martin's wonderful book Agile Software Development. Uncle Bob calls it 'The Dependency Inversion Principle', but these days 'Inversion of Control' seems to be the preferred term. it's what Martin Fowler calls it in this article.

I like to think of Inversion of Control as the ghostly opposite to subroutines, but instead of making it possible to break out sub pieces of code from a larger piece (the literal meaning of 'sub-routine'), it enables you to break out the surrounding code; the framework that's left when all the little sub-routines are removed. Being able to break out that remaining skeleton and reuse it is a very powerful tool.

However, subroutines are built into pretty much every modern programming language but IoC is something you have to spin yourself using OO building blocks. In this post I'll show you how.

Even the most junior programmers are aware of the idea of splitting their code into separate functions or subroutines. Using functions gives us our most important tool to avoid repeating ourselves, because repeating code is very very bad.

Take this code as an example. In true pedagogical fashion it is overly simplified, but if you can imagine that where I've simply put Console.Writeline(something), there's actually the real code to do that thing. Anyway, you get the idea.

[Test]
public void InlineReporterTest()
{
  // create some reports
  List<Report> reports = new List<Report>();
  for (int i = 0; i < 3; i++)
  {
      reports.Add(new Report(string.Format("Report {0}", i)));
  }

  // send reports by Email
  foreach (Report report in reports)
  {
      // pretend to send an email here
      Console.WriteLine("Sending by email: {0}", report.Title);

      // pretend to log here
      Console.WriteLine("[Log Message] Sent Report: {0}", report.Title);
  }

  // send reports by SMS
  foreach (Report report in reports)
  {
      // pretend to send an SMS message here
      Console.WriteLine("Sending by SMS: {0}", report.Title);

      // pretend to log here
      Console.WriteLine("[Log Message] Sent Report: {0}", report.Title);
  }
}

You can see that there's plenty of repeated code here. The two foreach loops are practically identical. Any sane developer would factor out the common code into a subroutine. Here's a possibility:

[Test]
public void ProceduralReporterTest()
{
  // create some reports
  List<Report> reports = BuildReports();

  // send reports by Email
  SendReports(reports, ReportSendType.Email);

  // send reports by SMS
  SendReports(reports, ReportSendType.Sms);
}

private static List<Report> BuildReports()
{
  List<Report> reports = new List<Report>();
  for (int i = 0; i < 3; i++)
  {
      reports.Add(new Report(string.Format("Report {0}", i)));
  }
  return reports;
}

private static void SendReports(List<Report> reports, ReportSendType reportSendType)
{
  foreach (Report report in reports)
  {
      switch (reportSendType)
      {
          case ReportSendType.Sms:
              // pretend to send an SMS message here
              Console.WriteLine("Sending by SMS: {0}", report.Title);
              break;
          case ReportSendType.Email:
              // pretend to send an email here
              Console.WriteLine("Sending by email: {0}", report.Title);
              break;
      }
      // pretend to log here
      Console.WriteLine("[Log Message] Sent Report: {0}", report.Title);
  }
}

Now we only have one copy of the foreach loop and one copy of the logging code. We've made the foreach loop a little more complex by inserting a switch statement, but it's probably worth it to remove the significant amount of duplication we had before. Also, if there are other places in our program that need to send reports they can use that same SendReports subroutine. I've also factored out the creation of the reports list into a subroutine called BuildReports. What we are left with is a skeleton: some coordinating code that calls BuildReports and then passes the reports to SendReports twice. Once to send them as Emails and secondly to send them as SMS messages.

Let's take a little interlude now and talk about object orientation. In simple terms... no very simple terms, this is the idea that we can split out separate concerns into separate classes or components that know how to do one thing and one thing only. They carry their data around with them encapsulated from interference from the outside world. We're now going to refactor our example again, this time using classes:

[Test]
public void OOReporterTest()
{
  ReportBuilder reportBuilder = new ReportBuilder();
  List<Report> reports = reportBuilder.GetReports();

  ReportSender reportSender = new ReportSender();

  // send by email
  reportSender.Send(reports, ReportSendType.Email);

  // send by SMS
  reportSender.Send(reports, ReportSendType.Sms);
}

public class ReportBuilder
{
  public List<Report> GetReports()
  {
      List<Report> reports = new List<Report>();
      for (int i = 0; i < 3; i++)
      {
          reports.Add(new Report(string.Format("Report {0}", i)));
      }
      return reports;
  }
}

public class ReportSender
{
  public void Send(List<Report> reports, ReportSendType reportSendType)
  {
      foreach (Report report in reports)
      {
          switch (reportSendType)
          {
              case ReportSendType.Sms:
                  // pretend to send an SMS message here
                  Console.WriteLine("Sending by SMS: {0}", report.Title);
                  break;
              case ReportSendType.Email:
                  // pretend to send an email here
                  Console.WriteLine("Sending by email: {0}", report.Title);
                  break;
          }
          // pretend to log here
          Console.WriteLine("[Log Message] Sent Report: {0}", report.Title);
      }
  }
}

Now we've got two separate classes, one that's responsible for building reports and one that's responsible for sending them. In this simple example these classes have no state, so there's no benefit from encapsulation, but they can participate in inheritance hierarchies which would allow us to extend them without having to alter them, the famous open-closed principle.

But going back to the skeleton again; the client code; it is hard coded to take our ReportBuilder and call our ReportSender, once for emails and once for SMSs. Although we can reuse the ReportBuilder and the ReportSender we can't reuse the coordinating code. Another problem worth noting is that the ReportSender has an intimate knowledge of sending emails and SMS messages, if we wanted to add a third type of sender, we would have to add an extra member to the ReportSendType enumeration and alter the switch statement inside the ReportSender. This is where IoC comes in. With it we can factor out the calling code and remove the tight coupling between the ReportSender and the different sending methods.

The basic technique of IoC is to factor out the public contracts of our classes into interfaces. The public contracts being the public properties and methods of our classes that the outside world interacts with. Once we've factored the public contracts into interfaces we can make our client code rely on those interfaces rather than concrete instances. We can then 'inject' the concrete instances in the constructor of our coordinating class; this is known as Dependency Injection. In the example below we've factored the coordinating code into the Reporter class and then injected a concrete IReportBuilder and IReportSender.

[Test]
public void IoCReporterTest()
{
  IReportBuilder reportBuilder = new ReportBuilder();

  // send by email
  IReportSender emailReportSender = new EmailReportSender();
  Reporter reporter = new Reporter(reportBuilder, emailReportSender);
  reporter.Send();

  // send by SMS
  IReportSender smsReportSender = new SmsReportSender();
  reporter = new Reporter(reportBuilder, smsReportSender);
  reporter.Send();
}

public interface IReportBuilder
{
  List<Report> GetReports();
}

public interface IReportSender
{
  void Send(Report report);
}

public class EmailReportSender : IReportSender
{
  public void Send(Report report)
  {
      Console.WriteLine("Sending by email: {0}", report.Title);
  }
}

public class SmsReportSender : IReportSender
{
  public void Send(Report report)
  {
      Console.WriteLine("Sending by SMS: {0}", report.Title);
  }
}

public class Reporter
{
  IReportBuilder reportBuilder;
  IReportSender messageSender;

  public Reporter(IReportBuilder reportBuilder, IReportSender messageSender)
  {
      this.reportBuilder = reportBuilder;
      this.messageSender = messageSender;
  }

  public void Send()
  {
      List<Report> reports = reportBuilder.GetReports();

      foreach (Report report in reports)
      {
          messageSender.Send(report);
      }
  }
}

Notice how we can reuse the Reporter to send first emails then SMSs without having to specify a ReportSendType. The Reporter code itself is much simpler because we don't need the switch statement any more. If we wanted to add a third sending message, we would simply implement IReportSender a third time and inject it into ReportBuilder's constructor.

But, wait a minute, I've forgotten about the logging somewhere along the line! Never fear! Without having to recode Reporter I can use the Decorator pattern to create a logger that implements IReportSender and gets injected with another IReportSender in its constructor:

class ReportSendLogger : IReportSender
{
  IReportSender reportSender;
  ILogger logger;

  public ReportSendLogger(IReportSender reportSender, ILogger logger)
  {
      this.reportSender = reportSender;
      this.logger = logger;
  }

  public void Send(Report report)
  {
      reportSender.Send(report);
      logger.Write(string.Format("Sent report: {0}", report.Title));
  }
}

Now I can simply string a logger and an IReportSender together and I have logging again without having to change a thing in Reporter.

[Test]
public void ReporterTestWithLogging()
{
  IReportBuilder reportBuilder = new ReportBuilder();
  ILogger logger = new Logger();

  // send by email
  IReportSender emailReportSender = new ReportSendLogger(new EmailReportSender(), logger);
  Reporter reporter = new Reporter(reportBuilder, emailReportSender);
  reporter.Send();

  // send by SMS
  IReportSender smsReportSender = new ReportSendLogger(new SmsReportSender(), logger);
  reporter = new Reporter(reportBuilder, smsReportSender);
  reporter.Send();
}

So there we have it, the power of Inversion of Control. In my next post I'll show how IoC makes real unit testing possible: Inversion of Control, Unit Tests and Mocks.

Friday, October 26, 2007

Who uses this stuff?

I was ranting about my some of my favorite subjects to a colleague recently: Test Driven Development, Continuous Integration, Component Oriented Design, Transacted Source Control; and he asked, quite reasonably, if any of my clients had actually used any of these techniques. The sad answer is "very few", but it started me thinking. I've had 7 clients since I started doing .NET commercially in 2002, ranging from huge corporations to tiny four man web development shops. I thought it would be illustrative to summarize their use of tools and practices. So here we go.

When I first started this blog, my second post was a list of development essentials. For each organization I've listed the (very approximate, OK it's me guessing most of the time) budget and team size for the project and whether they used any of the essentials I mentioned in that post: Dedicated Build Server, Automated Deployment, Source Control, Test Driven Development, Tool Enforced Coding Standards and Bug Tracking Databases. Yeah, it's not the best list and it's mostly focused around my developer level outlook and things I've personally found useful. I'm also a bit of an agile fanboy and I could just as easily made a list of clients that follow agile practices, but that would have been a very short list of none. There are also plenty of analysis and project management tools and techniques, but that's for another time.

A large cable company (£100,000,000 - 100+ people)

Dedicated Build Server: yes

Automated Deployment: yes

Source Control: yes (source safe)

Test Driven Development: no

Tool Enforced Coding Standards: no

Bug Tracking Database: yes

This was a huge project. At the time they said it was the largest .NET development in the UK. We had a team from Microsoft on board and a lot of the development was off-shored. When I joined this project they had no centralized build or deployment process and it was in deep deep pain. It took a change in management and the lucky fact that they'd inadvertently employed a build expert or two without intending it, before a centralized build and deployment process was put in place. This allowed the project to hobble on for a while, but initial poor management and quality control meant that the code base was a monster and even surrounding it with the best processes and the cleverest people couldn't make it agile enough to be anything but a hindrance for the business. It's now being replaced with a third party product.

A tiny web development shop (£10,000 - 2 people)

Dedicated Build Server: yes

Automated Deployment: no

Source Control: yes (subversion)

Test Driven Development: yes

Tool Enforced Coding Standards: yes

Bug Tracking Database: no

These guys were on the ball. They were doing a lot of things right: centralized builds, decent source control (subversion) and actually doing TDD for once. The simple reason was that the lead technical guy was very good and really cared about following good practice.

One of Europe's largest software service companies (£1,000,000 - 10 people)

Dedicated Build Server: yes

Automated Deployment: yes

Source Control: yes (source safe)

Test Driven Development: yes

Tool Enforced Coding Standards: yes

Bug Tracking Database: yes

This was a large government contract. When I joined the project they were in a whole world of pain six months had passed with almost nothing of value being delivered. Maybe because of this they actually listened to my incoherent rantings and allowed me to re-architect the entire application as well as introduce TDD and have a big input on the build system.

One of the world's largest insurers (£100,000 - 4 people)

Dedicated Build Server: yes

Automated Deployment: yes

Source Control: yes (source safe)

Test Driven Development: yes

Tool Enforced Coding Standards: yes

Bug Tracking Database: yes

I was the single developer on this integration piece, so I used all my favorite tools and techniques. I was aided by having one of the best architects I've ever worked with on the team. Another very strong developer developed the build and deployment process. A very pleasant experience.

One of the UK's biggest banks (£1,000,000 - 10 people)

Dedicated Build Server: no

Automated Deployment: no

Source Control: yes (source safe)

Test Driven Development: no

Tool Enforced Coding Standards: no

Bug Tracking Database: yes

This was similar to the previous project, another integration piece, but almost at the opposite extreme in terms of their understanding of modern development practices. I left.

One of the UK's busiest web sites (?? - 1 person)

Dedicated Build Server: no

Automated Deployment: no

Source Control: yes (source safe)

Test Driven Development: no

Tool Enforced Coding Standards: no

Bug Tracking Database: no

These guys were pretty good in terms of software architecture, but had almost no process. It showed how far you can get with some clever people even if you're just hacking, but they were starting to feel some scale pain. They were working on implementing centralized build and deployment but hadn't considered TDD.

A small fast growing software company (£20,000 - 3 people)

Dedicated Build Server: no

Automated Deployment: no

Source Control: yes (source safe)

Test Driven Development: no

Tool Enforced Coding Standards: no

Bug Tracking Database: yes

These guys are very similar to the previous project. I have to watch what I say because I'm still working with them, but I think they'd agree that they're starting to feel a little scale pain and starting to talk about putting more process around what they do. On the positive side there are plenty of people there interested in writing good software and it will be interesting to see what practices they adopt as the company grows. I've ranted about TDD at one of our development meetings and got a mostly positive response.

 

OK, so what have I learnt from the last five years (I try not to think about my former life as a VB Mort :). Well, the same pattern seems to repeat itself. A successful small product, initially often written by one person will grow and attract more and more resources. As the project grows more people are hired and then it's a race between how quickly the team realizes they have to adopt larger scale techniques and how quickly the software decays. A lot depends on the quality of the initial product and how well the lead developer passes on his design vision and how consistent the code stays. But even with well architected code and good developers, once the team gets past a certain size, things like having centralized build and deployment become essential if there is to be any confidence that the software can be integrated and tested. Without continuous refactoring code decays and you can't refactor with confidence if you don't have unit tests. I think that TDD is the most important technique to emerge in enterprise software development, certainly in the ten years I've been working in the field. It provides a step change in any teams ability to build great software. It's still mostly unknown in the Microsoft development world, but I think it makes such a great improvement that it will inevitably see progressively wider adoption.

Of course the quality of the people on the team is the most important factor, but then good developers adopt good techniques so they tend to go together. I think it's important that everyone agrees about what needs to be done. It's always a terrible mistake to try and impose techniques from above. To a certain extent you can enforce centralized build and deployment, but insisting on TDD without proper buy-in is almost always counter-productive. Trying to enforce design practices is a waste of time; nobody can do good software design except good developers.

Of course you can make successful software without worrying about architecture, clean design or testing of any kind, so long as it has one essential property: it's small. By small I mean maybe something written by one person over a month or two. But as soon as the software grows over a certain scale, if you don't start to listen to the lessons of people who have built large scale software and suffered for it, you are going to feel pain.

Tuesday, October 23, 2007

Developer Developer Developer Day. I've been selected!

Wow, I got an email this morning from zi makki telling me that my talk on 'Why do I need an Inversion of Control Container?' has been selected for the 6th Developer Developer Developer Day. Thanks to everyone who voted for me! I now have only a couple of weeks to get the first draft of my talk together so that I can submit the slides for the 9th November.

Monday, October 22, 2007

The Hollywood Principle

As you might know if you read this blog, I've been getting very excited by Inversion of Control containers recently. One of the things I keep coming across is 'The Hollywood Principle':

"don't call us, we'll call you."

Basically the idea is that a class says what it does by implementing an interface and what it needs by requesting interfaces, then a framework decides when to create it and what concrete instances to give it.

Friday, October 19, 2007

Breaking a list into columns using custom iterators

I've been working on a web application that has lots of lists of items in columns. The existing code was pretty ugly; it repeated the same pattern over and over again, iterating through the list of whatever items while keeping a count that triggered the move over to the next column. Lots of error prone state variables and conditionals. In obedience to the programming god DRY, I decided to factor out the column building. It turned out to be a great application for the custom iterators that came with .NET 2.0.

First of all, here's the test that shows how it's going to work. Yes, yes, I know this isn't  a proper unit test, there are no assertions and it simply outputs the results to the console, but it serves its purpose here.

using System;
using System.Collections.Generic;
using NUnit.Framework;

namespace Mike.ColumnDemo
{
    [TestFixture]
    public class ColumnTests
    {
        [Test]
        public void BreakIntoColumns()
        {
            // create a list of strings
            List<string> lines = new List<string>();
            for (int i = 0; i < 20; i++)
            {
                lines.Add(string.Format("item {0}", i));
            }

            foreach (Column<string> column in ColumnBuilder.Get(3).ColumnsFrom(lines))
            {
                Console.WriteLine("\nColumn\n");
                foreach (string line in column)
                {
                    Console.WriteLine(line);
                }
            }
        }
    }
}

Isn't that much nicer? Do you like the line segment 'ColumnBuilder.Get(3).ColumnsFrom(lines)'? I've become a big fan of DSL-ish APIs, they're much more readable, even if you do need to do slightly more work up front to make them happen.

Here's the ColumnBuilder class. The nested class ColumnCounter actually does most of the work of working out the column boundaries. It yields a Column object for each column (of course). By the way, I love the Math class, it's well worth digging into even if you don't do a lot of maths in your applications.

using System;
using System.Collections.Generic;

namespace Mike.ColumnDemo
{
    public static class ColumnBuilder
    {
        // return a ColumnBuilder from Get() so that we can write this:
        // foreach(Column<Person> column in ColumnBuilder.Get(4).ColumnsFrom(People)) { ... }
        internal static ColumnCounter Get(int numberOfColumns)
        {
            return new ColumnCounter(numberOfColumns);
        }

        public class ColumnCounter
        {
            int numberOfColumns;

            public ColumnCounter(int numberOfColumns)
            {
                this.numberOfColumns = numberOfColumns;
            }

            // Break the items into the given number of columns
            internal IEnumerable<Column<T>> ColumnsFrom<T>(IList<T> items)
            {
                int itemsPerColumn = (int)Math.Ceiling((decimal)items.Count / (decimal)this.numberOfColumns);
                for (int i = 0; i < this.numberOfColumns; i++)
                {
                    yield return new Column<T>(items, i * itemsPerColumn, ((i + 1) * itemsPerColumn) - 1);
                }
            }
        }
    }
}

Finally, here's the column class. It simply iterates through the list of items and only yields the ones in the start-end range.

using System;
using System.Collections.Generic;

namespace Mike.ColumnDemo
{
    // represents a single column
    public class Column<T> : IEnumerable<T>
    {
        IEnumerable<T> items;
        int start;
        int end;

        public Column(IEnumerable<T> items, int start, int end)
        {
            this.items = items;
            this.start = start;
            this.end = end;
        }

        public IEnumerator<T> GetEnumerator()
        {
            int index = 0;
            foreach (T item in items)
            {
                if (index >= start && index <= end)
                {
                    yield return item;
                }
                index++;
            }
        }

        System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
        {
            return (System.Collections.IEnumerator)GetEnumerator();
        }
    }
}

Thursday, October 18, 2007

F# to join C# and VB.NET as a core .NET language

Functional programming is very cool and I've been following the development of F#, a functional language from Microsoft Research in Cambridge for the last year or so. A while back I had a go a writing Life in F# and really enjoyed it. I'm now slowly working my way through The Little Schema which is blowing my mind with recursion. Building up basic mathematical building blocks from Add1 and Subtract1 using recursion is great brain food. OK, so it's Scheme and not F#, but the functional principles still apply. But I digress.  Apparently, yesterday, Soma announced that F# will included as an official Microsoft language with full tool support from Visual Studio. It's already pretty mature, definitely good enough for some serious playing, so download it and get currying!

Tuesday, October 16, 2007

Installing Vista

I finally got around to installing the freebee Vista Ultimate I got at Mix last night. I started to sweat a bit when the installation got to the 'expanding files' bit and stuck at 27% for about 15 minuets. After googling, it looked like a few people's Vista installations had failed at the same point and I was just about roll everything back when my wife distracted me. Five minutes later I went back to the installation and it had started moving again. I wonder if the people who were posting about the installation freezing had waited long enough? My Dell dimension 8400 only gets a score of 2.4, mainly because of the low end video card, so I don't get all the wizzy aero graphics; maybe it's time I invested in some new hardware :)

Monday, October 15, 2007

A custom SSIS PipelineComponent

I've recently been doing a spot of data migration. It's not my favorite task mostly because it involves the painful manual task of mapping one set of tables and columns to another. Drudge work of the worst kind. I've been using SQL Server Integration Services (SSIS) which is the replacement for Data Transformation Services (DTS). Data migration is a job I've mostly managed to avoid in recent years, so I never really had a reason to play with DTS and so I can't comment on how much, if at all, SSIS improves on it. SSIS basically gives you a design surface that you can drag various data sources and destinations and then connect them up via a whole stack of tools such as 'Derived Column', 'Data Conversion' and 'Lookup'. It's designed to support a variety different data manipulation task, including data mining and it's certainly capable of handing most data migration requirements.

SSIS

However, one of my data migration tasks was to take a simple list of locations in the source database that looked a bit like this:

ID     Location_Name
------ --------------------------
1      All UK
2      -England
3      --South
4      ---South East
5      ----Bedfordshire
6      ----Berkshire
7      ----Buckinghamshire
8      ----East Sussex
9      ----Essex
10     ----Hampshire
11     ----Hertfordshire
12     ----Isle of Wight
13     ----Kent
14     ----London & M25
15     -----London
16     ------Central London
17     -------City of London
18     ------East London
19     ------North London
20     ------South London
21     ------West London

... and import it into a table that looks like this:

id          name                 parentId
----------- -------------------- -----------
0           All                  NULL
1           All UK               0
2           England              1
3           South                2
4           South East           3
5           Bedfordshire         4
6           Berkshire            4
7           Buckinghamshire      4
8           East Sussex          4
9           Essex                4
10          Hampshire            4
11          Hertfordshire        4
12          Isle of Wight        4
13          Kent                 4
14          London & M25         4
15          London               14
16          Central London       15
17          City of London       16
18          East London          15
19          North London         15
20          South London         15

Notice how a location hierarchy is represented by the number of hyphens before the location name in the source table, but is represented as a correctly normalized relationship in the destination table. In order to map these two tables I wanted to write some C# code. You can write VB script directly into a script component, but it wasn't really what I wanted. The other obvious alternative was to write my own custom SSIS pipeline component. A brief look at the MSDN documentation on this suggested that it should be quite easy. You just extend a base class PipelineComponent and override a few methods. But in true MSDN style the documentation had large gaps in it leaving out some essential information, eventually I discovered CalendarTransform by Grant Dickinson, also on MSDN which pointed me in the right direction. The SSIS extensibility API could be much easier to use. It's not impossible, but there's far too much digging around for column ids that really should be provided by the toolkit.

Anyway, for your amusement, here's my very simple pipeline component that takes the source table and pumps out the destination.

[ComVisible(false)]
[DtsPipelineComponent(DisplayName="Location Mapper", ComponentType = ComponentType.Transform)]
public class LocationMapper : PipelineComponent
{
    int parentIdColumnIndex;
    int locationDescriptionColumnIndex;

    int idColumnIndex;
    int locationNameColumnIndex;

    // this override provides the columns at design time to allow you to wire up you components
    // on the design surface.
    public override void ProvideComponentProperties()
    {
        base.ProvideComponentProperties();

        //Support resetting the component, this is straight out of CalendarTransform   
        this.RemoveAllInputsOutputsAndCustomProperties();
        this.ComponentMetaData.RuntimeConnectionCollection.RemoveAll();
        this.ComponentMetaData.UsesDispositions = false;
        this.ComponentMetaData.ValidateExternalMetadata = true;

        // Add the input collection.
        IDTSInput90 input = ComponentMetaData.InputCollection.New();
        input.Name = "Input";
        input.ExternalMetadataColumnCollection.RemoveAll();
        input.ExternalMetadataColumnCollection.IsUsed = false;

        // Add the output collection.
        IDTSOutput90 output = ComponentMetaData.OutputCollection.New();
        output.Name = "Output";
        output.ExclusionGroup = 0;
        output.SynchronousInputID = input.ID;
        output.ExternalMetadataColumnCollection.RemoveAll();
        output.ExternalMetadataColumnCollection.IsUsed = false;

        // add the output columns, this is the bit that the MSDN documentation doesn't tell you :P

        // parent id
        IDTSOutputColumn90 parentIdColumn = this.InsertOutputColumnAt(output.ID, 0, "ParentId", "Parent Id");
        parentIdColumn.SetDataTypeProperties(DataType.DT_I4, 0, 0, 0, 0);

        // location description
        IDTSOutputColumn90 locationDescriptionColumn = this.InsertOutputColumnAt(output.ID, 1, "LocationDescription", "Location Description");
        locationDescriptionColumn.SetDataTypeProperties(DataType.DT_WSTR, 500, 0, 0, 0);
    }

    // this runs first at runtime.
    public override void PreExecute()
    {
        // input columns, assumes that the correct columns have been mapped on the design surface.
        // a bit of a hack!
        IDTSInput90 input = ComponentMetaData.InputCollection[0];
        IDTSInputColumnCollection90 inputColumns = input.InputColumnCollection;

        IDTSInputColumn90 idColumn = inputColumns[0];
        if (idColumn == null)
        {
            throw new ApplicationException("id column is missing");
        }

        IDTSInputColumn90 locationNameColumn = inputColumns[1];
        if (locationNameColumn == null)
        {
            throw new ApplicationException("location name column is missing");
        }

        // this is the really wacky stuff, you have to discover the column indexes inside the buffer
        // using this convoluted syntax. I could never have worked this out for myself!!
        idColumnIndex = BufferManager.FindColumnByLineageID(input.Buffer, idColumn.LineageID);
        locationNameColumnIndex = BufferManager.FindColumnByLineageID(input.Buffer, locationNameColumn.LineageID);

        // output columns
        IDTSOutput90 output = ComponentMetaData.OutputCollection[0];
        IDTSOutputColumnCollection90 outputColumns = output.OutputColumnCollection;

        IDTSOutputColumn90 parentIdColumn = outputColumns[0];
        IDTSOutputColumn90 locationDescriptionColumn = outputColumns[1];

        // do the crazy column index lookup again, this time for the output columns.
        parentIdColumnIndex = BufferManager.FindColumnByLineageID(input.Buffer, parentIdColumn.LineageID);
        locationDescriptionColumnIndex = BufferManager.FindColumnByLineageID(input.Buffer, locationDescriptionColumn.LineageID);
    }

    // this is the bit that actually does all the work.
    public override void ProcessInput(int inputID, PipelineBuffer buffer)
    {
        base.ProcessInput(inputID, buffer);

        if (buffer != null)
        {
            if (!buffer.EndOfRowset)
            {
                // level is a little class that I wrote to manage the parent/child mapping.
                Level level = new Level();

                while (buffer.NextRow())
                {
                    short id = (short)buffer[idColumnIndex];
                    string locationName = (string)buffer[locationNameColumnIndex];

                    // location is another helper class that simply describes a row in the destination table
                    Location location = new Location(id, locationName);
                    level.Set(location);

                    buffer[parentIdColumnIndex] = location.ParentId;
                    buffer[locationDescriptionColumnIndex] = location.Name;
                    buffer[orderColumnIndex] = location.Order;
                }
            }
        }
    }
}

Friday, October 12, 2007

Vote For Me!

Would you like to see me humiliate myself in front of a technical audience at one of the UK’s biggest .NET developer events? If the answer is "nothing would make me happier", go to the following link and choose my session, ''Why do I need an Inversion of Control Container?" (search for ‘Mike Hadlow’) and 9 others.

http://www.developerday.co.uk/ddd/votesessions.asp

The event is Developer Developer Developer Day 6 at the Microsoft campus in Reading on Saturday 24th November. I’ve volunteered to talk about Inversion of Control containers for an hour. It should be fun so come along!

Tuesday, October 09, 2007

Microsoft's new MVC web development framework

I've just watched the funeral of Web Forms. On Scott Hanselman's blog is a video of ScottGu presenting an initial prototype of Microsoft's new MVC rails-a-like web development framework at the ALT.NET conference. Gone is the event driven, postback, impossible to test old Web Forms. I guess Microsoft would have had to have had it's collective head buried firmly in the sand not have been closely watching all the excitement surrounding rails and more recently MonoRail, both of which were heavily name checked by Scott during his presentation. The framework has some very neat stuff that heavily leverages many of the new language features of C# 3.0 such as lambda expressions and anonymous types. Scott was particularly keen to show off the rather nice URL processor that makes it very easy to do SEO. It still uses the aspx templating engine for the views, so it will support all the nice wysiwyg support as well as the AJAX update panel stuff. Also nice is the way that it supports IoC containers and dependency injection.

Someone in the audience asked how Microsoft will be positioning this vis-a-vis Web Forms. He said that it wasn't a replacement for Web Forms, which will still be supported, but the enterprise development message will be to use the new framework. I expect Web Forms will share a similar fate to the Dataset, gradually melting away as one by one us Morts get the message. Probably around 4 years.

Monday, October 08, 2007

Windsor Container Resources

Here a few Windsor Container resources that I've stumbled across recently. I'm planning to maintain this list as a reference for my potential presentation at DDD6.

Jeremy Jarrell of Digital Blasphemy does a good overview in his 'Windsor IoC Container in a Lunch Break'. The comments are worth reading too.

Alex Henderson aka Bitter Coder has a great tutorial in his Wiki.

Oren Eini has a very cool article showing the power of Windsor when working with generic interfaces on MSDN: Inversion of Control and Dependency Injection: Working with Windsor Container.

Thursday, October 04, 2007

My Developer Developer Developer Day session

I've just offered to present a session on Inversion of Control containers at the Developer Developer Developer Day at the Microsoft campus on the 24th November. It's titled 'Why do I need an Inversion of Control Container?' and I intend to present an expanded version of my recent post about the Castle Project's Windsor Container. I think there's enough material to easily fill an hour, especially if I do a hands-on coding session.

Although I'm an ex English teacher, spending two years on the JET program, and I've done plenty of presentations and workshops for my clients, this will be the first time I've presented at a developer event like this. It should be a lot of fun, I just hope my session gets enough votes to be put forward. So if you're reading this and thinking, "I'd really like to see Mike humiliate himself in front of a technical audience', don't fail to vote for me when registration opens for DDD :)

Tuesday, October 02, 2007

Ben on MonoRail

As you'd probably know from reading any of my recent posts, I've become very curious about the Castle Project recently. One of the it's elements is the MonoRail Rails-a-like web development toolkit that sits on top of ASP.NET. 'Ben' (I couldn't find his full name on his blog anywhere) provides a great critique of WebForms here and the pros and cons of MonoRail here. I recently ran through the getting started guide for MonoRail and it has really intrigued me, especially the separation of concerns that an MVC framework provides that allows you to finally be able to test your web application logic properly.

Why are you still hand coding your data access layer?

At last it seems that the dreaded DataSet is dead. There are many reasons why you should always think twice before choosing the DataSet as the core of your application architecture, I covered most of them a couple of years ago here. In my freelancing work I've found that none of my recent clients have used DataSets, preferring instead some kind of Domain Model or Active Record data access mechanism, with Active Record becoming by far the favorite. It's also worth noting that the terminology in most Microsoft shops calls the Active Record class a 'business object' or 'data object', almost nobody says 'Active Record'.

A core part of an Active Record based architecture is some kind of Data Access Layer that does Object Relational Mapping (ORM). Everyone writes their own one of these, and that's the main point of this post; you shouldn't need to do this. If you are like the majority of my clients, your application features thousands of lovingly hand crafted lines of code like this:

DataAccessLayer

These hand written data access layers are the commonest source of bugs in most business applications. There are several reasons for this, the most obvious being that there are hand coded string literals representing stored procedure names, stored procedure parameter names and column names. People do use an  enumeration instead of the string literal column names, mainly for performance reasons, but it doesn't stop the enumeration and the stored proc select statement's columns getting out of sync. There's also the overhead of matching up the SQL Server types to .net types and dealing with null values.  But the worst offense of all is the tedium and the waste of time. Writing and maintaining these Data Access Layers is the programmer's equivalent of Dante's inner ring of hell, and you don't have to do it.

If you're like me, you resent any kind of repetitive coding that the computer could do just as easily, but much much faster and more accurately. At some time in your career you've probably written a little program that queries the database schema and generates your Active Record classes and Data Access Layer. Yes, I've done this twice, once back in the days of Visual Basic client server systems and more recently for .NET. The second attempt got quite sophisticated, handling object graphs and change tracking, but I never really got it to the stage of a real Data Access Framework, one that could look after all my persistence needs. I used it to generate the bulk of the code and then hand coded the tricky bits, such as complex queries and relationships. I'm not the only one who's been  down this road, a whole army of very clever people, cleverer than you or me, have devoted large amounts of time to this problem, which is great because it means that you and me don't have to bother any more. These tools are now robust and mature enough that it's more risky to do it yourself than use one of them.

But how  do you choose which one to use? There are two basic approaches, code generators and runtime ORM engines. Code generators, like mine, are the easiest to create, so there are more of them out there. Runtime ORM engines are a much trickier engineering problem but they will probably win in the end because they're easier to use for the end developer. Amongst the code generators, the ones I hear of the most are various Code Smith templates like net tiers, LLBLgen by Frans Bouma who's also a very active participant in community discussions around ORM, Subsonic which is attempting to be a Rails for .net, and Microsoft's very own Guidance Automation Toolkits. All are well regarded and you probably wouldn't go too far wrong with choosing any of them.

Among the runtime ORM engines, I hear NHibernate mentioned more than anything else. Hibernate is huge in the Java world so the NHibernate port has plenty of real world experience to fall back on. It's been used in a number of large scale projects and is the core of the Castle project's ActiveRecord rails-a-like data access solution. I haven't used it in anger, but my few experiments with it have been quite fun.

I haven't mentioned the elephant in the room yet, that's LINQ to SQL coming with .net 3.5. Microsoft have taken a long time to join the ORM party. A couple of years ago there was much talk of ObjectSpaces a Hibernate style ORM tool that never saw the light of  day. LINQ is a very elegant attempt to integrate declarative query style syntax into imperative .net languages like C#. LINQ to SQL makes good use of it, especially, as you'd expect, with its query syntax. In other ways LINQ to SQL is very much a traditional ORM mapper along the lines of Hibernate, it features some code generation tools to create your Active Record objects, a runtime ORM mapper that creates SQL on the fly, identity management and lazy loading; all the features you'd expect. If you're starting a new project and you're prepared to risk using the beta of Visual Studio 2008 then I would choose LINQ to SQL in favor of any of the other alternatives, not least because it's what everyone will be using in a couple of years time.