Greetings

Welcome to the new NTCoding - now with MVC. Fully designed and developed in-house.

I created this site to share my thoughts on software development, primarily within the .NET community, but often glancing beyond the Microsoft walls at technologies such as Ruby on Rails. There are a number of open source projects I also use and you can see my favourites on the profile page, including some of those used to create this site.

Reading books about programming is also a big interest of mine and I like to review each book I read. These reviews are available on the books page and come complete with my personal rating.

If you have any questions or comments then feel free to use the contact page. Feedback on the site is always nice but I'm happy to chat about anything covered on this site.

Recent Blog Entries

Not Sure About "NO DB"

19/05/2012

Abstractions are fundamental to object-oriented programming – to computing even – to life even. One abstraction that Uncle Bob likes is the database abstraction. In his NO DB post, he makes two main points:

1. Choosing a SQL database isn’t a given – you should decide what storage is best

2. Your database should be totally abstracted in your code – you can switch between any underlying data store and your code still works the same

First point is a good one, imo. Second one is a bit more contentious. It’s a pretty popular view though – every one of my colleagues has massive belief in it. All of these people are smarter than me, so I should probably change my opinion, but…

…I’m a bit quirky, and even though my ratio of correct to incorrect is about 1:10, I’ll show you why I still have the belief that a database abstraction to that extreme is possibly a bad one.

THIS IS NOT A RANT OK??!!!


Abstraction Trade Offs

Abstractions are concepts. We take a set of lower-level features and combine them into a single conceptual unit. We then think about this single unit without caring about the lower-level features it encompasses.

Complexity / Runtime Performance Ratio
All abstractions leave open the possibility of a runtime performance penalty. You are hiding those lower-level features and potentially blocking off your path to them. Sometimes this good….

C# is compiled to CIL which goes on to become assembler. If you were to write the assembler yourself you could write it in a more efficient way that resulted in faster execution of your code at runtime. It would take you a looooong time, though.

For this abstraction, the complexity is reduced by an insane amount, yet the performance penalty is relatively little with modern computers.

With a database abstraction the complexity being hidden is simple and it doesn’t help us create software any faster. Yet due to database querying and crossing the network it can slow down runtime performance by orders or magnitude more than other abstractions (from nanoseconds, to hundreds of milliseconds, to seconds even).
 
Switching Abstractions
Uncle Bob didn’t care much for performance. He focused on the other impact of abstractions – you can switch the lower-level features you’ve hidden behind that abstraction.

And for many an abstraction, I think that is a grand thing to do. Different payment calculations for different types of staff, different promotions for different months of the year, different validation rules for different business rule – some abstractions we switch freely and frequently and let us write simple code. Most of the time the performance implications of these small in-memory algorithms is nil.

I don’t recall too many times I’ve used a database abstraction polymorphically. I don’t recall too many times I’ve switched databases, either. You may have different experiences to me, though.

Abstractions are Inefficient at Runtime

So far, we’ve accumulated the benefit of being able to switch the underlying database – although we probably won’t ever do it, and we’ve hidden the complexity – although there probably isn’t that much.

In return we’re expecting some performance degradation, with the potential for significant performance degradation. I’ll explore now how much this might be.
 

Different Database Abstractions Need Different Data Models

To get optimal performance from our database, we need to design our code model so that it fits with a data model that is best suited for querying the chosen type of database. In his example Uncle Bob talked about being able to switch from a relational to a document database with no change in code.

Considering relational and document databases are fundamentally different – tables vs key value stores – it already seems quite possible that a code model that fits both is going to have significant performance implications with one or both databases lying underneath it.

Let’s have a look at simple examples of how you would model for relational and document databases to get better performance.

SQL
 If using SQL via NHibernate, a typical model might look like this:

public class House
 {
        public string Id { getset; }

        public string Address { getset; }

        public bool HasGarden { getset; }

        public bool HasBeenPurchased { getset; }

        public IList<Room> Rooms { getset; }
}

public class Room
{
        public string Id { getset; }

        public string Name { getset; }

        public int Area { getset; }

        public int Height { getset; }

        public int NumberOfElectricalSockets { getset; }
}

Sometimes we want to get a House but not the rooms, so NHibernate will lazy load them for us. We could use a micro-ORM that doesn’t lazy load, so we will keep al list of the Ids of each room instead.

Document Databases
public class House
{
        public string Id { getset; }

        public string Address { getset; }

        public bool HasGarden { getset; }

        public bool HasBeenPurchased { getset; }

        public int RoomsId { getset; }
}

public class HouseRooms
{
        public int Id { getset; }

        public IList<Room> Rooms { getset; }
}

public class Room
{
        public string Name { getset; }

        public int Area { getset; }

        public int Height { getset; }

        public int NumberOfElectricalSockets { getset; }
}

When using a document database, to get best performance I’d denormalise the rooms, so all of a house’s rooms live in one document (HouseRooms). This fits my usage patterns perfectly – I either want a house on its own, or a house with all of its rooms.

The alternative is a map-reduce, which I believe is not good for performance or complexity.

This is a fairly small difference, but applied to every entity in the domain it could be very significant.

More Opinions
I found the following links useful when I was learning about document databases and how to model for them.

Runtime Database Inefficiency Might Not Scale

If we accept the assumptions so far, then our database abstraction will cost us performance, and potentially quite a lot. But so far, the customers are happy, the business is happy and our application works well. We are selling lots of automated house designs.

Now the business is spreading across the land into all sorts of new territories. Customers are going insane after the last bout of television ads for the product. But the application is taking a pretty heavy load.

Poor Mr Database, he is being battered. If it weren’t for all the select N+1 in the code and the inefficient queries from the application he could easily handle this load. As it stands, though, he is the bottleneck in the system and the application has slowed down to 30 second response times.

As a side-note, it was interesting to hear that Twitter had a similar problem where there database abstractions did not scale http://www.infoq.com/presentations/Abstractions-at-Scale (go to around 27 – 33 minutes).

 Hardware is Cheaper Than Dev Time – Throw Money at it

An obvious answer to the scaling problem is to just beef up the server. And indeed, that probably works up to a point. That point will either be when a bigger server is more expensive than multiple small servers, or when there are no bigger servers. What then?

Now you add more servers. Now you need to start replicating and maybe sharding your data. Distributed transactions, retry logic, failover…..

I would personally like to put that off as long as possible. I’m sure the systems team and the DBA team would, too.

We may even come to rue the associations between our domain entities that heavily impact how the data can be split up amongst servers – if at all possible.

 

Are Performance Requirements and Load Testing the Answer?

One idea floating around is that the business provides performance requirements and a load testing environment that is able to accurately simulate that load to verify it. Personally I have never seen this and don’t know if it is possible. I would love it though.

But even if it was, and the load test – which somehow replicated live load, and live usage patterns – gave a damning report of your performance, now you have to optimise it. How do you do that?

Your domain model needs to change now for performance.

We’ve seen how to get the best out of your database you need to model your domain entities in a way that fits with your database. So we could re-model the entities? Those entities which the entire domain model is built around?

Then you need to change the queries – those queries that hide the database technology behind your database abstraction. On a lucky day we could just optimise the logic in the abstraction, but if the whole algorithm the query is a part of has to change it might be too difficult depending how many abstractions you built on top.
 
Now for the caveat
I have never been in that exact position, so I’m talking semi-shit. But I’ve seen code susceptible to those problems. The data access was so far abstracted that small changes hurt and performance improvements would have been difficult.

We didn’t hit those heights of performance drama, but it felt like we were well on the way. That code base is now being completely over-hauled as it happens.

 

How Much Should we Abstract?

If we have a simple system along the lines of CRUD then there’s no domain model to abstract. I showed in my last post that I feel abstractions in those cases are unnecessary.

For a domain model as complex as the one Uncle Bob has here I would probably pass around the interface I use to communicate with my database with e.g. RavenDB/NHibernate session or maybe something lightweight over a micro-ORM if I was using one.

I want to it to be clear when I’m communicating with the database or shooting myself in the foot with N+1. When I need to optimise I can see exactly where the problems are. I don’t believe this pollutes the domain too much at all. And we can certainly refactor to methods before interfaces to reveal intention.

If the database changes, then yes, the code for the business rules will change a bit too. It’s not ideal, but it’s a trade-off. If future performance is any kind of concern you might like it.

Even Eric Evans in DDD concedes that the database will have an effect on the domain model – although it should try to be limited.

I certainly do not like seeing SQL data readers being passed around. Nor does my friend Ronald….


If Things are Working for You….

If your abstractions are working for you then you probably do not need to worry. You might be abstracting your database into oblivion yet you're product is performing well and making money for the business. 

You will possibly not be running at optimal performance, and arguably there would have been minimal sacrifice to get to that stage (by just not abstracting the database) but that's no problem right now.

Sometimes it might be the opposite case and what I've suggested in this post might be close to being correct. But there are a lot of ifs and buts in this post, and probably some retarded theory. So just remember….

I talk rubbish and I’m wrong 9 times out of 10





The Contentious Controller

09/05/2012
I laughed to myself when I realised how I conceptualise controllers. I went from a fresh-faced beginner who lumped all their logic in there to a clean code aficionado who took SRP out of context and had an abstraction for almost every method.

But as I’ve learned about keeping things simple from the likes of Greg Young and Udi Dahan, I’ve used this approach to systematically challenge the complexity of my code. Recently I came back to “controllers” and classified my thoughts.

I’m now going to suggest that abstractions are un-needed extra work in simple CRUD or Q&M (query and map) scenarios. I’m not saying this is the correct approach, just exploring the potential for increased simplicity.

Simple Systems Need Simple Code – Not Abstractions

If you’ve got a web app where all of the “controllers” share
this level simplicity (fetch some data and map it):

public class ShowEndpoint
{

     private readonly IDocumentSession session;

     public ShowEndpoint(IDocumentSession session)
     {
          this.session = session;
     }

     public CarsViewModel Get(CarsRequestModel requestModel)
     {
          var cars = GetCars(requestModel);
          var model = new CarsViewModel();
          Map(cars, model);
         
         return model;
     }
     
     private IQueryable<Car> GetCars(CarsRequestModel requestModel)
     {
          return session
                 .Query<Car>()
                   .Where(c => c.Price < requestModel.MinPrice)
                   .Where(c => c.Price > requestModel.MaxPrice)
                   .Take(requestModel.PageSize);  
     }
     
     private void Map(IQueryable<Car> cars, CarsViewModel model)
     {
          var carDtos = new List<ShowCarDto>();


           foreach (var car in cars)
           {
              var dto = new ShowCarDto
                  {
                      Model         = car.Model,
                      Manufacturer  = car.Manufacturer,
                      Price         = car.Price,
                      NumberOfDoors = car.NumberOfDoors
                  };


                  carDtos.Add(dto);
          }


          model.Cars = carDtos;
      }
}

Do you go with common wisdom and let TDD drive out a data-access abstraction, a mapper abstraction, and some inheritance (with subsequent coupling)?  You may be pressurised by what other people may think of your “SRP violation”, but you decide to be objective…..

How Does this Bloody Thing Work?
You think about the next dev who will join your team and work out what your system does. You shouldn’t really, but you admit to yourself: she will have difficulty not understanding those 2 simple methods.

Adding New Features?
You think to yourself: once I create that repository method, and other people start calling it for similar queries, the coupling will make it difficult for me to update the query in just this one place – where for
this screen only I just want to exclude cars not in this user’s region.

Fixing Bugs
One last thought flows between your ears as your prepare to knock-up a repository of type t where t is some instance of god object: if things break on the live system, and the business people are going insane -  I’m going to need to find out what is broken. No, it would be too simple to know that one endpoint is broken, and the few scraps of functionality all live in that one
class.

When the Complexity Does Come?
Well, sir, you will be in prime position to know how the components of your system interact. Which use cases are most susceptible to change and which code is the most volatile – you can decompose your “nasty” controller logic according to how your system is being used.

It’s Easier and Quicker
Errr, what he said ^^^

SRP is About Users of the System

Me and my SRP violations, eh? According to Uncle Bob SRP is centric to users of the system. A responsibility belongs to a single user – and these things should be grouped together so responsibilities can be modified without impacting others.

For starters, if you’ve got a user of your system who is in any way tied to how you map from one object to another, then you’ve got bigger problems than controller logic. And you provide the LOLs for my insightful
readers.

Secondly, whilst database concerns might be an additional responsibility  a database abstraction can have very high costs. As Uncle Bob says “Welcome to software engineering”. You have to choose between trade offs.

Vertical Slices of Functionality

Importantly you’ve created a vertical slice of functionality – everything related to this one simple view of the application’s UI is in one place. You’ve grouped things together that change together. High fives!!!!!

Not All Logic Lives in the Controller

As I tripped over myself to point out, I’m talking about simple scenarios such as crud or query and map.  I am not in any way suggesting domain logic or business rules live in the controller. I’m with Ian Cooper - http://codebetter.com/iancooper/2008/12/03/the-fat-controller/

In The End……..

I’m probably talking rubbish, but I did manage to find an example using the simple approach on a simple app: https://github.com/ayende/RaccoonBlog/blob/master/RaccoonBlog.Web/Controllers/PostsController.cs

There’s no correct answer, but if your current solution is working and you’re happy then life is good. I hope you enjoyed my alternative take on controller logic and my challenge to needless complexity.







Is "AssertWasCalled" an Anti-pattern?

28/04/2012

One style of unit testing is to verify that methods were called on passed in mocks. In Rhino Mocks the method is “AssertWasCalled”. In this blog post, I’m putting forward my reasoning why these tests are unnecessary, creating more work and indirection - with no value in return.

It started off as internal discussion at 7digital, with probably an equal number of people taking the for and against opinion. It was an interesting and good-natured debate – one of the reasons its rewarding and enjoyable to be a dev at 7digital (mention I referred you if you apply J) – but there was no clear outcome. Importantly, I couldn’t see the other side of the argument.
So here’s my side of the story, and my request for you to show me what I am missing – what value do these kind of tests give me?


False Security

Here’s one of those tests that verifies some collaborator was called. It probably means that if the credentials are bad the mp3 shouldn’t be given to the user, or if they are good then vice versa.
[Test]
public void Permission_checker_verifies_credentials_when_getting_an_mp3()
{
    var checker = MockRepository.GenerateMock<IPermissionChecker>();
    var retriever = MockRepository.GenerateMock<IAssetRetriever>();
    var provider = new AssetProvider(retriever, checker);
 
    string assetCode = "blahAssetCode";
    string userCode = "blahUserCode";
 
    provider.GetMp3Asset(assetCode, userCode);
 
    checker.AssertWasCalled(c => c.CanVerifyCredentialsFor(assetCode, userCode));
}

 But how do I make it pass, doing the simplest possible thing?....
public Mp3Asset GetMp3Asset(string assetCode, string userCode)
{
    permissionChecker.CanVerifyCredentialsFor(assetCode, userCode);
 
    return assetRetriever.GetMp3(assetCode);
}

My test console is lit up with the finest shade of bright green jetbrains can provide, and my heart is warm knowing that no underserving-fiend will be listening to my digital copy of Lady gaga. But wait… the call to  permission checker may as well not be there because it contributes in no way to this behaviour.


All we had to do to get the test to pass was call that method anywhere, in any order – and that’s it.


We can fix that
[Test]
public void Returns_mp3_asset_from_provider_when_permission_checker_verifies_credentials()
{
    string assetCode = "blahAssetCode";
    string userCode = "blahUserCode";
 
    var asset = new Mp3Asset("Lady gaga");
 
    checker.Stub(p => p.CanVerifyCredentialsFor(assetCode, userCode)).Return(true);
    assetRetriever.Stub(a => a.GetMp3(assetCode)).Return(asset);
 
    var result = provider.GetMp3Asset(assetCode, userCode);
 
    Assert.That(result, Is.EqualTo(asset));
}
 
[Test]
public void Blows_up_with_unauthorized_exception_when_permission_checker_cannot_verify_credentials()
{  
   string assetCode = "R2D2";
   string userCode = "C3P0";
 
   checker.Stub(p => p.CanVerifyCredentialsFor(assetCode, userCode)).Return(false);
 
  Assert.Throws<UnAuthorizedException>(() => provider.GetMp3Asset(assetCode, userCode));
}

Now I have two more tests that describe the real behaviours of this asset provider – users with permission get their Lady gaga fix, while the fiends are kept wanting. With the simplest possible solution, it forces my production code to take this shape:

public Mp3Asset GetMp3Asset(string assetCode, string userCode)
{
var hasPermission = permissionChecker.CanVerifyCredentialsFor(assetCode, userCode);
 
  if(hasPermission) return assetRetriever.GetMp3(assetCode);
 
  throw new UnAuthorizedException();
}

What value is the “AssertWasCalled” test giving me now? What value did it give me in the first place? I can’t think of one reason why I needed it in the first place or why I should keep it.

Maintenance Costs

I’ve argued the test provides me no value. If in doubt, however, why not keep it, in case there is some hidden value visible above my level of intellect? I’m about to show you that it has costs – costs in addition to those inherent of creating and maintaining any code.
Let’s make a change
Google has now purchased the asset manager and they make all their money from ads; now they give away the mp3s for free. All they want is a log stating that you took it for free, just incase the patent trolls drain their bank account, forcing Google to retrospectively recover the fee from you at a later date. Don’t worry, this is all covered in the Google terms and conditions you agreed to.
Here’s Google’s new asset business model:
public Mp3Asset GetMp3Asset(string assetCode, string userCode)
{
    var mp3 = assetRetriever.GetMp3(assetCode);
 
    Logger.LogFreeGiveaway(assetCode, userCode, mp3);
 
    return mp3;
}

 Bethany, the developer at Google changing the code, decides to be a good developer and run the tests before checking in:

She laughs out loud at the unauthorized exception, thinking there is no such thing as a free lunch yet Google is somehow giving stuff away for free. That test is soon deleted in line with the new business policy.

Cost 1: Dangerous Miscommunication
But what about the “AssertWasCalled” test? Some previous developer explicitly stated this method MUST be called – it was part of the behaviours and requirements for this feature. Does it have some unknown side effect? Is it because of a bug in the system where permission checker does some background job that allows the asset retriever to get the mp3s? She doesn’t know – but whoever wrote that test, explicitly stated – you MUST call this method, whether you do anything with the return value or not.

Because it is Google we don’t care.  Imagine this is your code, or imagine you have to change this code: can you be certain it is safe to delete the “AssertWasCalled” test? You’ll probably have to spend time digging around just to make sure it is safe. But that’s ok, the value of this test is worth it, right?

Cost 2: Refactoring Baggage

 Imagine if almost every time you make a small refactoring, you have to go and fix 2 broken tests. Take the example of moving a responsibility to a new class. You have to move the behavioural test on to the new object, and cart around this “AssertWasCalled” as well
When we’re doing lots of small successive refactorings to code, do we really want double the amount of broken tests every time? Do want more things clogging up our mind as we try and clean up the code?


Cost 3: Slows down your test suite
With all these extra tests just to assert something was called, it will take time to execute them. For a couple it may be negligible, but as the test suite grows the time lost on these tests will become more noticeable if you’ve got them everywhere.

Sweet Spot

AssertWasCalled is not always bad…. sometimes a collaborator will make “cross-boundary” communication. I use this term to mean anything that isn’t synchronous or feeding back into the current execution of the code. Let’s take this example:

public void OrderBreakfast(IEnumerable<OrderLine> items)
{
    var address = GetCafeEmailAddress();

    var body    = BuildBreakfastOrderEmailBodyFor(items);

    emailSender.SendBreakfastOrder(address, body);
}

A behaviour of this method is that it will email the café the breakfast order. The only way we can be certain is to verify that it carries out this behaviour is to assert the email sender sent the breakfast order email.
I am not being critical of these use cases, and encourage them on the few occasions they are needed.

State of Play

I’ve shown you why I don’t like AssertWasCalled test in nearly all cases – I believe they provide no value, yet require extra work to create, maintain, and asses the validity of. I’ve shown how they break your flow and concentration by having to attend to more breaking tests, more things to refactor, and a slower test suite.

However, I invite you girls and boys to show me what I am missing - show me what value they add, and bring some balance to this argument with clear examples. If there is humble pie to be eaten, I shall indulge.

Actually I was quite hungry and made a start already...



 



Recent Tweets

20/05/2012

@abdullin Ok, since I like you I'll give her some slack

20/05/2012

@abdullin Hey that's cool....not sure about Tina Turner though

20/05/2012

@abdullin Did you just lose a lot of follwers? #onlyjoking

Currently Reading

book cover

Growing Object-Oriented Software, Guided by Tests

book cover

Programming Erlang

book cover

Code