Sunday, December 13, 2009

Book Review: Enterprise Integration Patterns

eip I’ve just finished reading ‘Enterprise Integration Patterns’ by Gregor Hohpe and Bobby Woolf for the second time. I first read it when it was published back in 2004. At the time I was struggling with web application architecture, and so it wasn’t directly relevant to my work. I guess I took away some idea of how messaging systems could work, but mostly my thoughts were, ‘that’s interesting, I wonder who uses it’. I came to read it second time because I’m responsible for architecting an enterprise integration architecture, so now it’s directly relevant to my work.

Whatever you may think about EIP, its influence has been huge. I don’t think anyone can have an intelligent discussion about integration architecture without referencing it. It defined the vocabulary for a lot of SOA and if you read anything by the SOA industry gurus, you will constantly encounter citations from this book.

It’s pretty clear how Hohpe and Woolf think you should do enterprise integration; because although Chapter 2 describes various integration styles; File Transfer, Shared Database, Remote Procedure Invocation; the rest of the book is about messaging and messaging only. Indeed, that is probably the book’s most important legacy. EIP is one of the key reasons that messaging has now become the default style of enterprise integration.

So why do Hohpe and Woolf think that messaging is the answer? Like pretty much anything to do with software it’s all about coupling, or rather avoiding coupling. Of course this includes logical coupling; not making one application have to care about the internals of another; but you can achieve that with web services. What messaging gives us is temporal decoupling; by making all our integration asynchronous we no longer need all our components to be available all the time. Introducing a service bus and publish/subscribe messaging also removes the need for our applications to even care where the other components are, or even if there are any to communicate with.

The first three chapters are essential reading for anyone building business applications. Chapter 1 describes the integration issues that face many large organisations and describes how messaging might be used to to glue disparate applications together. Chapter 2 describes alternatives to messaging as explained above and why you shouldn’t use them. Chapter 3 introduces basic messaging concepts including channels, messages, pipes and filters, routing, transformation and endpoints. Together they are probably the best explanation of why you should use messaging and how to get started that you will find anywhere.

EIP is a patterns book. This means that it’s primary purpose is to define a vocabulary. However, in defining the vocabulary Hohpe and Woolf also provide a comprehensive cookbook of solutions to common integration patterns. Like all patterns books, you often think, ‘I’ve done that myself’, but now you have a name for it, know the alternatives and possible problems before you start. Being a patterns book also means that it can be a bit tedious to read from cover to cover. Each pattern has to stand independently and has the same layout: A name, an icon, the context, a problem statement, forces acting on it, a solution including a sketch and related patterns. This makes it excellent as a reference work, but leaves something to be desired if you are expecting more of a narrative. But even if you don’t fancy tackling all 600+ pages of patterns, I would still recommend anyone building business software to read the first three chapters.

The icons for the patterns are a great visual vocabulary, but having said that I haven’t seen them very widely used. It’s a shame because they are quite pretty. You can download a Visio template for them here. Here’s a completely meaningless EIP style diagram I just stuck together:

image

My only other criticism of the book is that the technology it describes is now five years out of date. In the same way that Martin Fowler’s ‘Patterns of Enterprise Application Architecture’ sometimes reads like a specification for NHibernate, EIP often reads like a description of NServiceBus or MassTransit. As a keen MassTransit user, I often read a pattern only to think, ‘well, MassTransit covers that one, I don’t have to worry about it’. Many of the patterns have been further refined in the meantime and you can’t talk about messaging these days without some reference to the Command Query Responsibility Segregation pattern. For that reason, you shouldn’t read this book in isolation, but rather as a foundation for further investigation.

Monday, December 07, 2009

Skills Matter Functional Programming Exchange

I had a great time today at the Functional Programming Exchange organised by Robert Pickering and Skills Matter. Robert managed to grab some really interesting speakers who gave a nice snapshot of the current art and use of FP. The whole caboodle was hosted in Skills Matter’s new London offices and they did a magnificent job; plenty of free tea, cakes, sandwiches and pizza. Geek heaven :)

Here’s a rundown of the talks:

Sadek Drobi – Computation Abstraction.
I was late getting the train up from Brighton and arrived half way through the first talk, which was a pity because in many ways it set the scene for the whole day. Sadek showed how functional programming allows abstractions that are not available in imperative languages. I really liked the discussion on error handling and how you can easily create your own control structures with higher-order functions. What was very nice was that he used a variety of Functional languages for his demos.

Matthew Sackman – Supercharged Rabbit: Resource Management at High Speed in Erlang
I really enjoyed this talk. Matthew took us on a whirlwind tour of RabbitMQ, an AMQP based messaging system implemented in Erlang. Apparently it can scale to queues as large as your disk space without sacrificing performance. Sounds very impressive. It was cool to hear about why Erlang makes such an excellent tool for writing highly concurrent software. It was also interesting to hear what Matthew disliked about Erlang in comparison with Haskell. I liked this quote, “Rabbit isn’t really fast, it can only manage 25,000 messages per second” … compared with MSMQ that is fast.

Anton Schwaighofer – F# and Units-Of-Measure for Technical Computing
This was a talk that surprised me the most. F# is the functional language that I’ve made the biggest effort to learn and I thought I understood units-of-measure. To be honest, they hadn’t made a particular impression on me. I was very impressed after Anton’s talk. I really liked the way that the compiler understands how computation affects units. So for example, if you have a function that takes miles and hours as its arguments and returns the miles divided by the hours, the compiler knows that the output is miles per hour.

Ganesh Sittampalam – Functional Programming for Quantitative Modelling at Credit Suisse
Ganesh gave some very practical examples of how his group at CS use both Haskell and F#. Once again the recurring theme was that Haskell is really good for writing DSLs. Ganesh explained how they had written a DSL to create Excel spreadsheets and the challenges that involved. He also explained how they were using F# now as their core application development language.

Duncan Coutts – Strong Types and Pure Functions
I found Duncan’s talk the hardest to follow because of my almost complete ignorance of Haskell. Apparently Haskell allows you specify the side effects that a function is allowed to have in its type signature. By default Haskell is a side effect free language. That immutability or ‘pureness’ allows all kinds of optimisations and is one of the fundamentals of functional programming, but sometimes you have to have side effects. Any IO functions will fall into this category. What Haskell allows you to do is specify with a ‘Monad’ what side effects are allowed. I’m going to have to read a Haskell book.

Robert Pickering – Using Combinators to Tackle the HTML Rendering Problem
Robert showed us an F# DSL to generate HTML and Javascript. I guess it was more interesting from the DSL point of view than the HTML/Javascript generation. The DSL meme runs throughout FP and it’s instructive to see how trivial it is to write a simple DSL in F#. I just didn’t like the example. I have a fundamental distrust of tools that try to hide me from HTML and Javascript; I like HTML and Javascript. We’re only just recovering a back-to-basics approach from the WebForms train wreck so I’m a bit twitchy about this kind of thing.. back off Robert.. OK!

Sunday, December 06, 2009

The Monthly Code Quality Report

Since I started my new ‘architect’ (no, I do write code… sometimes) role earlier this year, I’ve been doing a ‘monthly code quality report’. This uses various tools to give an overview of our current codebase. The output looks something like this:

clip_image002

Most of the metrics come from NDepend, a fantastic tool if you haven’t come across it before. Check out the author, Patrick Smacchia’s, blog.

We have lots of generated code, so we obviously want to differentiate between that and the hand written stuff. Doing this is really easy using CQL (Code Query Language), a kind of code-SQL. Here’s the CQL expression for ‘LoC Failing basic quality metrics’:

WARN IF Count > 0 IN SELECT METHODS /*OUT OF "YourGeneratedCode" */ WHERE 
(   NbLinesOfCode > 30 OR
    NbILInstructions > 200 OR
    CyclomaticComplexity > 20 OR
    ILCyclomaticComplexity > 50 OR
    ILNestingDepth > 4 OR
    NbParameters > 5 OR
    NbVariables > 8 OR             
    NbOverloads > 6 )
AND
!( NameIs "InitializeComponent()"
    OR HasAttribute "XXX.Framework.GeneratedCodeAttribute" 
    OR FullNameLike "XXX.TheProject.Shredder"
)

Here I’m looking for overly complex code and excluding anything that is attributed with our GeneratedCodeAttribute, I’m also excluding a project called ‘Shredder’ which is entirely generated.

NDepend’s dependency analysis is legendary and also well worth a look, but that’s another blog post entirely.

The duplicate code metrics are provided by Simian, a simple command line tool that trolls through your source code looking for repetitive lines. I set the threshold at 6 lines of code (the default). It actually outputs a complete list of all the duplications it finds and it’s nice to be able to run it regularly, put the output under source control, and then diff versions to see where duplication is being introduced. A great way of fighting the copy-and-paste code reuse pattern.

The unit test metrics come straight out of NCover. Since there were no unit tests when I joined the team, it’s not really surprising how low the level of coverage is. The fact that we’ve been able to ramp up the number of tests quite quickly is satisfying though.

As you can see from the sample output, it’s a pretty cruddy old codebase where 27% of the code fails basic, very conservative, quality checks. Some of the worst offending methods would make great entries in ‘the daily WTF’. But in my experience, working in a lot of corporate .NET development shops, this is not unusual; if anything it’s a little better than average.

Since I joined the team, I’ve been very keen on promoting software quality. There hadn’t been any emphasis on this before I joined, and that’s reflected by the poor quality of the codebase. I should also emphasise that these metrics are probably the least important of several things you should do to encourage quality. Certainly less important than code reviews, leading by example and periodic training sessions. Indeed, the metrics by themselves are pretty meaningless and it’s easy to game the results, but simply having some visibility on things like repeated code and overly complex methods makes the point that we care about such things.

I was worried at first that it would be negatively received, but in fact the opposite seems to be the case. Everyone wants to do a good job and I think we all value software quality, it’s just that it’s sometimes hard for developers (especially junior developers) to know the kinds of things you should be doing to achieve it. Having this kind of steer with real numbers to back it up can be very encouraging.

Lastly I take the five methods with the largest cyclometric complexity and present them as a top 5 ‘Crap Code of the Month’. You get much kudos for refactoring one of these :)