Alternative distance function for Stable Abstractions Principle

Tools such as STAN use a distance function D = A + I – 1 to measure how well a particular package meets the stable-abstractions design principle. In my opinion this measure includes a lot of points in the design space that are not good places. Specifically, I think that packages should fall into the upper left or lower right corners only. This is particularly importaint if we consider the distance function to be a cost function and we are trying to automatically search for good designs. The D function can produces packages that are half abstract and half stable, which I would prefer to avoid.

Instead I propose to use the normalised absolute difference in the distance from the two desirable corners of the design space. Specifically I favour designs which are closer to either the stable-abstract or unstable-concrete corners of the design space, over designs which happen to be near the “main sequence” but are further from the desirable corner. I propose the distance function D’=sqrt(2) – abs(sqrt((1-I)**2 + A**2) – sqrt(I**2 + (1-A)**2))/sqrt(2). This function weighs the desirable corners the highest and provides a watershed so that a search function can converge towards the most desirable part of the design space by cost-minimisation.

Proposed stable-abstractions distance function (D’):

stable-abstractions cost function which weighs desirable corners higher than half-abstract half-stable packages

For comparison the original stable-abstractions distance function (D):

plot showing that the main sequence has the lowest cost, but an equal cost to the desirable corners

To make the difference bettween these two functions concrete: here is a plot of the absolute difference between the functions. The red areas are the areas of most aggreement. You can see that the central disputed area is the area of the design space most affected by the selection of my proposed new function, positive figures are less favoured areas in the proposed D’ function.

shows which areas differ most between the two cost functions

Avoiding the Dependency Injection Facebook Problem

Last year Steve Freeman published a blog article describing how the use of dependency injection could lead to coupling between objects that should have no knowledge of each other. The essential idea is that using dependency injection uniformly for all relationships between objects within the program makes it harder to understand and control the higher level structure of the program; you turn your program into an unstructured bag of objects where every object can potentially talk to every other object.

I thought that I would write a bit about my approach to dealing with this problem in Java programs: how I try to add higher level structure to the code; and how I use the tools to enforce this structure. I am going to take a long diversion through the practices I use to organise Java code before I talk about Dependency Injection. If you want to only read that bit, then skip to the end.

The following are my versions of the diagrams from a post about the problem with DI frameworks by David Peterson. I start with the same unstructured graph of objects and use the same basic progression in order to describe how I use the Java programming language to resolve these issues without having to sacrifice the use of my DI framework (should I want to use one).

For the sake of argument, the program begins as an unstructured graph of objects dumped into a single package within a single component (or project). The objects are all compiled together in one instantiation of the Java compiler. This program is likely to be quite difficult to understand and modify – not least, where do you even start trying to understand it? We can do better.

Program showing object dependencies

Types, Packages and the Classpath are the ways that the Java programming language offers compiler support for organising code into higher level abstractions. The benefit of using packages is that the compiler type system will enforce the abstraction boundary. If a type definition is marked as package visible then no code outside the package with a reference to such a type (class or interface name) will compile.

I refactor the code by identifying a related sets of classes and moving them into a package. Without breaking the program I then mark every type that I can as package visible. Now I can see some of the structure of my code: I can see which classes are implementation details of a particular abstraction; and I can see what protocol each of my abstractions provides by looking at its public classes. I cannot accidentally depend on an implementation detail of my abstraction because the Java compiler will prevent me (provided I have defined sufficient types and don’t do too much reflection).

Program showing package dependencies

There might be a lot of packages involved in a realistic program so I would like another level of abstraction that I can use to organise my packages into groups of related packages. I do this by organising my packages into libraries. All of the packages in my library will be compiled together in one single instantiation of the compiler, but each of my libraries will be compiled separately. I want to detect any accidental dependencies between my libraries so I will only put the minimum necessary libraries onto the classpath of each invocation of the compiler.

Program showing component dependencies

I can now see the high level component organisation of my program and I can see which components are used by a given component. I may even be able to see which component is the entry point of my program – the entry point component will transitively depend on all the other components. There are still two further properties I would like my code to have: I do not want to accidentally create a dependency between a package in one library and a package in another library (its public protocol should be clearly defined); and I would like the entry point of the application to be immediately obvious (so a code reader can start reading with no delay).

I extract the public interface of my library into a new library, and change the users of that library to depend only on the public interface. I also replace the implementation jar with the new API-only jar on the compiler classpath anywhere it occurs in my other libraries. This is a form of dependency inversion. The new interface library cleanly describes the protocol provided by the implementation and the separate compilation ensures that users of the library only depend on the public protocol.

I then extract the entry point (the highest level wiring) of the program into its own component. This component contains the minimum code necessary to instantiate the top level components, wire them together and start the execution. This is a form of inversion of control.

The structure of my program is now not only represented in the organisation of the code, but the structure is also enforced by the Java compiler.

Program showing inversion of component dependencies

I can looking at the package structure of the code.

Program showing only package structure

Or I can look at the component structure of the code.

Program showing component structure

What structure am I now able to obtain from the code?

  • Package API – by looking at the visibility of my classes
  • Package Relationships – by looking at the packages in each library
  • Library API – by looking at the fully abstract library component
  • Library Relationships – by looking at the classpath of each library
  • Program high-level organisation – by looking at the classpath of each library and the entry-point wiring

Dependency Injection and Dependency Injection Frameworks

So why haven’t I mentioned Dependency Injection and Dependency Injection frameworks? And how does all this relate to the “Facebook Dependency Injection” problem?

I can now choose to use dependency injection as much or as little as I want/need to. If I want to inject one library into another, or inject one package into another then I can just provide an appropriate factory. If I want to use new, I can do that too. The use of package or public visibility will distinguish between the peer objects and the internal implementation details. The Java compiler will prevent me from creating any unwanted dependencies by enforcing package protection. The same argument applies to dependencies between libraries: I can distinguish the public API of the library by splitting it out; by only having the public API on the classpath I can be sure that I will not accidentally use an implementation detail of a dependency.

The type system and compiler will help me prove to myself that the program will only instantiate an object graph with the structure that I intended.

I am now free to use a Dependency Injection framework anywhere within my code that I feel it is appropriate. I can use a Dependency Injection framework to wire up the implementation classes of a package, I can use it to wire up the packages of a library or I can use it to wire together my libraries – or any combination I choose. I can even use a different Dependency Injection framework within each of my libraries or even within each of my packages if for some reason I need to do that! I can be sure the structure of my program doesn’t just become a bag of objects because the compiler enforces it for me – I cannot just pull in an object of any type into my object, I can only pull in an object that has a type that is in the the public API of a library (and package) that is on my classpath.

100% branch and statement coverage does not mean the code is fully covered

Standard Disclaimer: This discussion is about high reliability software for situations where software failures can have significant impacts. The ideas discussed are not necessarily appropriate or cost effective for most software products.

When refactoring high reliability software it is often important not to introduce any observable behavioural changes – no bugs. But how do you go about it?

One answer is to have very high test coverage. Ensuring a high line and branch coverage is not enough to ensure that your refactoring is safe. The reason for this is that even 100% branch coverage might only test a very small subset of the possible executions of a program. A simple example:

int myFunction(int x)
{
   int result = 0;
   if(x % 2 == 0)
   {
       // A (even)
       result++;
   }
   else
   {
       // B (odd)
       result--;
   }
 
   if(x % 3 == 0)
   {
       // C (divisible by 3)
       result++;
   }
   else
   {
       // D (indivisible by 3)
       result--;
   }
 
   return result;
}

So, we can test all the branches with the tests values 2,3,7.

  • 2 – branches A,D
  • 3 – branches B,C
  • 7 – branches B,D

So, even with this simple code where there is 100% branch and statement coverage, it does not cover the path “A,C”. An additional test case is required. For example the value (x == 6) would cover “A,C”.

In practice, there can be a very large number of paths through a program, so exhaustively testing them can be either very expensive or completely impractical.

Recompilation Time and Layering

When following an iterative development methodology compilation time is a concern. In order to safely make a rapid series of small transformations to a program you need to regularly recompile and test the changes you are making. Using an IDE which can do incremental compilation is very helpful, but more complex programs with code and resource generation phases, when using languages without that kind of compilation support, and when verifying that the CI build will work, still require a fast full automated build and test cycle.

In a previous article I discussed a design principal that can be used to try to minimise the amount of time required to do a full compile of all the code in an application.

It is also possible to select design principals which will reduce the amount of recompilation required by any given code change. Conveniently it turns out that same abstract layering approach can be used here too.

We return to the example simple application;

application depending on library 1 depending on library 2 depending on library 3 depending on library 4

How long is required to compile if a change is made in each component, if each component takes “m” time to compile?

Component Compilation Order Compilation Time
library4 library4,library3,library2,library1,application 5m
library3 library3,library2,library1,application 4m
library2 library2,library1,application 3m
library1 library1,application 2m
application application 1m

If we calculate the instability of each module taking into account transitive dependencies. Abusing the definitions slightly we get:

instability = (Ce/(Ca+Ce))

Component Ce Ca I
library4 0 4 0
library3 1 3 0.25
library2 2 2 0.5
library1 3 1 0.75
application 4 0 1

Comparing the instability with the compilation time is interesting. Less stable packages should have lower compilation times, which we see here:

Component Time I
library4 5m 0
library3 4m 0.25
library2 3m 0.5
library1 2m 0.75
application 1m 1

Reducing Recompilation Time

We can restructure our application using the dependency inversion principal. We split each layer into an abstract (interface) layer and a concrete implementation layer.

The application module becomes responsible for selecting the concrete implementation of each layer and configuring the layers of the application. This design pattern is known as Inversion of Control.

application depends on libraries, but libraries depend on abstract layer interfaces

Taking into account the ability of a multi-core machine to compile modules in parallel, we get the following minimum compilation times:

Component Changed Compilation Order Compilation Time
library4 library4,application 2m
<<library4>> <<library4>>,{library4,library3},application 3m
library3 library3,application 2m
<<library3>> <<library3>>,{library3,library2},application 3m
library2 library2,application 2m
<<library2>> <<library2>>,{library2,library1},application 3m
library1 library1,application 2m
application application 1m

The compilation time for the abstract modules is uniformly 3m, and the compilation time for the concrete modules is uniformly 2m. The application itself is always the last to be compiled so is 1m as before.

How has the design change affected the stability of the modules?

Component Ce Ca I
library4 1 1 0.5
<<library4>> 0 3 0
library3 2 1 0.66
<<library3>> 0 3 0
library2 2 1 0.66
<<library2>> 0 3 0
library1 1 1 0.5
application 4 0 1

Again we see that the less stable a module is the lower its compilation time:

Component Time I
library4 2m 0.5
<<library4>> 4m 0
library3 2m 0.66
<<library3>> 4m 0
library2 2m 0.66
<<library2>> 4m 0
library1 2m 0.5
application 1m 1

This example is quite simple, so we see the instability metric of the modules being dominated by the application. It does illustrate however how we can use dependency inversion to help establish an upper bound on the recompilation time for any given change in a concrete module. I will examine a more realistic example in a future article.

Compilation Time and Layering

On a modern computer with multiple cores the total compile time of an application is related to the longest path though the dependency graph of the application.

In a statically type checked language that supports separate compilation of application components (into, for example jars or assemblies), it is generally necessary to compile the dependencies of an application or library before compiling the application or library itself.

A typical layered application might look like this:

application depending on library 1 depending on library 2 depending on library 3 depending on library 4

The compile order for such an application would be library4 then library3 then library2 then library1 then application. The longest path through the dependency graph is 5.

Assuming, for simplicity, that the compile time for each module is approximately the same (m) the total compile time for the application (t) is 5m.

t = 5m

Taking Advantage of Parallel Compilation

We can restructure our application using the dependency inversion principal. We split each layer into an abstract (interface) layer and a concrete implementation layer.

The application module becomes responsible for selecting the concrete implementation of each layer and configuring the layers of the application. This design pattern is known as Inversion of Control.

application depends on libraries, but libraries depend on abstract layer interfaces

The compile order for such an application is much more flexible than that of the earlier design allowing some modules to be compiled in parallel.

The longest path through the dependency graph is 3.

Assuming again, that the compile time for each module is approximately the same (m) the minimum compile time for the application (t) is 3m.

t = 3m

Dependency Inversion

What is a Dependency

For the purposes of this discussion a class A has a dependency on another class B, iff you cannot compile class A without class B.

Example

class A {B b;}
class B { }

Box A with an arrow pointing to box B

We have to compile class B before (or at the same time as) class A.

Other kinds of Dependency

Anywhere the name “B” appears in class A creates a dependency. Some other examples of dependencies are:

class A extends B { }
class A implements B { }
class A { void method(B b) { } }

Transitive Dependencies

If a class A depends on another class B which itself has dependencies then the dependencies of class B are effectively dependencies of class A

Example:

class A { B b; }
class B { C c; } 
class C { }

Box A with an arrow to Box B with an Arrow to Box C, dotted arrow from Box A to Box C

We have to compile class C before (or at the same time as) class B and class A.

The Problem

When class C changes, we have to recompile and retest both class A and class B. In a large system this can take a very long time. It also means that you have to know about class C in advance; you cannot decide on class C after deciding on class A.

If class C is a more concrete class then it might change more frequently than class A. This will cause class A to be recompiled/tested much more frequently than it otherwise would need to be.

The Solution

Inverting Dependencies

Have class A and class B both depend on an abstraction I. This inverts the direction of the dependency arrow on class B.

interface I { }
class A { I i; }
class B implements I { }

Box A with arrow to Box I, Box B with arrow to Box I

Breaks Transitive Dependency

The really helpful effect of this inversion is that it also breaks the transitive dependency from class A onto class B

interface I{ }
class A { I i; }
class B implements I{ C c; }
class C { }

Box A arrow to Box I, Box B arrow to Box I, Box B arrow to box C. Box A dotted arrow to Box C deleted

Dependency Inversion Principle

This is an application of the Dependency Inversion Principle:

  • High level modules should not depend on low level modules. Both should depend on abstractions.
  • Abstractions should not depend upon details. Details should depend upon abstractions.

Compositional Patterns for Test Driven Development

This article is going to look at how to implement a parameterizable algorithm so it can both conform generally to the open/closed principal and also most effectively be tested.

A common pattern for code reuse, implementation selection, or extension, is to use class inheritance and the template method pattern. This is an example of an implementation of the template method pattern with two different variations B and C:

abstract class A {
    public void doSomething() {
        doP();
        doQ();
        doR();
    }
 
    protected abstract void doP();
    protected abstract void doQ();
    protected abstract void doR();
}
 
class B extends A {
    @Override protected void doP() { /* do P the B way */}
    @Override protected void doQ() { /* do Q the B way */}
    @Override protected void doR() { /* do R the B way */}
}
 
class C extends A {
    @Override protected void doP() { /* do P the C way */}
    @Override protected void doQ() { /* do Q the C way */}
    @Override protected void doR() { /* do R the C way */}
}

Refactoring template method pattern to Strategy Pattern

We can always convert this template method pattern to a compositional pattern by performing a refactoring in the following steps (if you have member variable access there are a couple more steps, but I’ll cover them in a follow up article):

Step 1; encapsulate the construction of the B and C strategies:

abstract class A {
    public void doSomething() {
        doP();
        doQ();
        doR();
    }
 
    protected abstract void doP();
    protected abstract void doQ();
    protected abstract void doR();
}
 
class B extends A {
    public static A createB() {
        return new B();
    }
 
    @Override protected void doP() { /* do P the B way */}
    @Override protected void doQ() { /* do Q the B way */}
    @Override protected void doR() { /* do R the B way */}
}
 
class C extends A {
    public static A createC() {
        return new C();
    }
 
    @Override protected void doP() { /* do P the C way */}
    @Override protected void doQ() { /* do Q the C way */}
    @Override protected void doR() { /* do R the C way */}
}

Step 2; extract an interface for the strategy methods:

interface S {
    void doP();
    void doQ();
    void doR();
}
 
abstract class A implements S {
    private final S s = this;
 
    public void doSomething() {
        s.doP();
        s.doQ();
        s.doR();
    }
}
 
class B extends A {
    public static A createB() {
        return new B();
    }
 
    @Override public void doP() { /* do P the B way */}
    @Override public void doQ() { /* do Q the B way */}
    @Override public void doR() { /* do R the B way */}
}
 
class C extends A {
    public static A createC() {
        return new C();
    }
 
    @Override public void doP() { /* do P the C way */}
    @Override public void doQ() { /* do Q the C way */}
    @Override public void doR() { /* do R the C way */}
}

Step 3; pass the strategies into the superclass instead of using this:

interface S {
    void doP();
    void doQ();
    void doR();
}
 
final class A {
    private final S s;
 
    public A(final S s) {
        this.s = s;
    }
 
    public void doSomething() {
        s.doP();
        s.doQ();
        s.doR();
    }
}
 
class B implements S {
    public static A createB() {
        return new A(new B());
    }
 
    public void doP() { /* do P the B way */}
    public void doQ() { /* do Q the B way */}
    public void doR() { /* do R the B way */}
}
 
class C implements S {
    public static A createC() {
        return new A(new C());
    }
 
    public void doP() { /* do P the C way */}
    public void doQ() { /* do Q the C way */}
    public void doR() { /* do R the C way */}
}

Advantage of the compositional style

Less fragile

Changes to A such as new methods are much less likely to break the strategies in the compositional style.

Easier to Test

In the compositional style, class A can be tested by itself using Mocks. As can class B and class C. In the inheritance style class B and class C cannot be tested without also testing class A. This leads to duplication in the tests, as features of class A are re-tested for every subclass.

Easier to Reuse

In the compositional style class B and class C can be reused in other contexts where A is not relevant. In the inheritance style this type of reuse is not possible.

Emergent Model/Domain concepts

If class B or class C are reused in a different context, it may turn out that during subsequent refactoring they develop a new role within the system. I may even discover an important new domain concept.