Avoiding the Dependency Injection Facebook Problem

Last year Steve Freeman published a blog article describing how the use of dependency injection could lead to coupling between objects that should have no knowledge of each other. The essential idea is that using dependency injection uniformly for all relationships between objects within the program makes it harder to understand and control the higher level structure of the program; you turn your program into an unstructured bag of objects where every object can potentially talk to every other object.

I thought that I would write a bit about my approach to dealing with this problem in Java programs: how I try to add higher level structure to the code; and how I use the tools to enforce this structure. I am going to take a long diversion through the practices I use to organise Java code before I talk about Dependency Injection. If you want to only read that bit, then skip to the end.

The following are my versions of the diagrams from a post about the problem with DI frameworks by David Peterson. I start with the same unstructured graph of objects and use the same basic progression in order to describe how I use the Java programming language to resolve these issues without having to sacrifice the use of my DI framework (should I want to use one).

For the sake of argument, the program begins as an unstructured graph of objects dumped into a single package within a single component (or project). The objects are all compiled together in one instantiation of the Java compiler. This program is likely to be quite difficult to understand and modify – not least, where do you even start trying to understand it? We can do better.

Program showing object dependencies

Types, Packages and the Classpath are the ways that the Java programming language offers compiler support for organising code into higher level abstractions. The benefit of using packages is that the compiler type system will enforce the abstraction boundary. If a type definition is marked as package visible then no code outside the package with a reference to such a type (class or interface name) will compile.

I refactor the code by identifying a related sets of classes and moving them into a package. Without breaking the program I then mark every type that I can as package visible. Now I can see some of the structure of my code: I can see which classes are implementation details of a particular abstraction; and I can see what protocol each of my abstractions provides by looking at its public classes. I cannot accidentally depend on an implementation detail of my abstraction because the Java compiler will prevent me (provided I have defined sufficient types and don’t do too much reflection).

Program showing package dependencies

There might be a lot of packages involved in a realistic program so I would like another level of abstraction that I can use to organise my packages into groups of related packages. I do this by organising my packages into libraries. All of the packages in my library will be compiled together in one single instantiation of the compiler, but each of my libraries will be compiled separately. I want to detect any accidental dependencies between my libraries so I will only put the minimum necessary libraries onto the classpath of each invocation of the compiler.

Program showing component dependencies

I can now see the high level component organisation of my program and I can see which components are used by a given component. I may even be able to see which component is the entry point of my program – the entry point component will transitively depend on all the other components. There are still two further properties I would like my code to have: I do not want to accidentally create a dependency between a package in one library and a package in another library (its public protocol should be clearly defined); and I would like the entry point of the application to be immediately obvious (so a code reader can start reading with no delay).

I extract the public interface of my library into a new library, and change the users of that library to depend only on the public interface. I also replace the implementation jar with the new API-only jar on the compiler classpath anywhere it occurs in my other libraries. This is a form of dependency inversion. The new interface library cleanly describes the protocol provided by the implementation and the separate compilation ensures that users of the library only depend on the public protocol.

I then extract the entry point (the highest level wiring) of the program into its own component. This component contains the minimum code necessary to instantiate the top level components, wire them together and start the execution. This is a form of inversion of control.

The structure of my program is now not only represented in the organisation of the code, but the structure is also enforced by the Java compiler.

Program showing inversion of component dependencies

I can looking at the package structure of the code.

Program showing only package structure

Or I can look at the component structure of the code.

Program showing component structure

What structure am I now able to obtain from the code?

  • Package API – by looking at the visibility of my classes
  • Package Relationships – by looking at the packages in each library
  • Library API – by looking at the fully abstract library component
  • Library Relationships – by looking at the classpath of each library
  • Program high-level organisation – by looking at the classpath of each library and the entry-point wiring

Dependency Injection and Dependency Injection Frameworks

So why haven’t I mentioned Dependency Injection and Dependency Injection frameworks? And how does all this relate to the “Facebook Dependency Injection” problem?

I can now choose to use dependency injection as much or as little as I want/need to. If I want to inject one library into another, or inject one package into another then I can just provide an appropriate factory. If I want to use new, I can do that too. The use of package or public visibility will distinguish between the peer objects and the internal implementation details. The Java compiler will prevent me from creating any unwanted dependencies by enforcing package protection. The same argument applies to dependencies between libraries: I can distinguish the public API of the library by splitting it out; by only having the public API on the classpath I can be sure that I will not accidentally use an implementation detail of a dependency.

The type system and compiler will help me prove to myself that the program will only instantiate an object graph with the structure that I intended.

I am now free to use a Dependency Injection framework anywhere within my code that I feel it is appropriate. I can use a Dependency Injection framework to wire up the implementation classes of a package, I can use it to wire up the packages of a library or I can use it to wire together my libraries – or any combination I choose. I can even use a different Dependency Injection framework within each of my libraries or even within each of my packages if for some reason I need to do that! I can be sure the structure of my program doesn’t just become a bag of objects because the compiler enforces it for me – I cannot just pull in an object of any type into my object, I can only pull in an object that has a type that is in the the public API of a library (and package) that is on my classpath.

What is the Java type system for? – Case coverage

So what might a programming language with strong static named types be useful for? I recently asked an interview question about how the observer pattern could be implemented in a situation involving the occurrence of a number of different types of events. This answer was suggested:

interface EventListener {
    void eventOccurred(Event event);
}
 
class MyEventHandler implements EventListener {
    void eventOccurred(Event event) {
         if(event.getType() == EventType.subtypeOne) {
            doTypeOneThing((EventSubtypeOne) event);
         } else if(event.getType() == EventType.subtypeTwo){
            doTypeTwoThing((EventSubtypeTwo) event);
         } else {
            throw new RuntimeException();
         }
    }
}

There are a number of issues with this design, however the main issues that occurs in the kind of large programs I often work on is: when a new type is added how do you make sure that all of the types of events are handled in every implementation of EventListener?

The suggested solutions are often along the lines of writing test cases for each of your EventListener implementations and taking the absence of an exception as success. But then how do you make sure you have updated all of your tests correctly?

A simple, easy to understand, and Object Orientated way to resolve this problem is:

interface EventListener {
    void eventOneOccurred(EventOne event);
    void eventTwoOccurred(EventTwo event)
}
 
class MyEventHandler implements EventListener {
    void eventOneOccurred(EventOne event) {
            doTypeOneThing(event);
    }
 
    void eventTwoOccurred(EventTwo event) {
            doTypeTwoThing(event);
    }
}

Important points to notice about this solution:

  1. No casting. Since we never represent the event as a generalised abstraction, we never need to downcast the event. We don’t even need a base type for event unless we have some use for that abstraction in our system.
  2. No runtime error. As long as all the code is compiled together (or at least has binary compatibility) then no runtime error will occur because our type system proves that all the types of events are handled.
  3. Updates are easy. When we add a new event type, we add the new method to the interface and the compiler will tell us where all the implementations are! This is very fast if you have an IDE with incremental compilation and all the code. But it will work safely even in situations where the people implementing your interfaces are distributed and you have no possibility of updating their code. When they try to compile against the new version of your library they will immediately find out about the new feature (if they like it or not)!

Of course, you don’t always want these properties in every design. But when you do want your design to have these properties, this is how you can do it.

EventCast Whiteboard Pattern (Listeners Considered Harmful)

The first time I encountered the whiteboard (or blackboard) pattern was when reading the pragmatic programmer. Since then I have seen it a few times. The major mainstream use of the whiteboard I have seen is in Dependency Injection frameworks like Google Guice and Spring.

In my opinion these DI frameworks might better be described as an implementation of the whiteboard pattern for service (dependency) location and contribution, which also do dependency injection. They can provide you objects from the whiteboard – one way they do that is by DI. But they also can do it in other ways, the Guice Injector can be used as a factory directly (using the getInstance method). Or by providing factory implementations (Providers or my favourite AssistedInject) that draw objects from the whiteboard.

In particular, this is very helpful when an application is structured for testability in a ports and adapters architecture. I am aware that Nat Pryce has an issue with DI frameworks not distinguishing between internals and peers, but I don’t think I have really run into that problem practically. I think probably he is also making the point that DI libraries can hide important information about how you objects are put together. I think both issues probably depend on how you use the DI library and which objects you choose to put into its whiteboard – in general avoid hiding the rules about relationships between your domain objects in your DI usage. BTW; I don’t think EventCast hides relationships in general (actually the opposite), but I can see how it could be used to do that.

So, onto the main point of this post.

The OSGi framework uses the whiteboard pattern to manage dependencies between bundles. In particular when one bundle is driven by events from another (supporting IoC). There is a provocatively named paper called Listeners Considered Harmful: The “Whiteboard” pattern on how they used the whiteboard pattern to replace the listener pattern.

I have recently been working on a library called EventCast based on my experiences using Guava EventBus. What it does is use Guice to implement the whiteboard pattern for OSGi style inter-object communication using interfaces to define the listeners (kind of a very light, stripped down, version of some of what you get with OSGi).

EventCast tries to be as unintrusive to the application code as it can. A single instance of the listener interface is injected into (hopefully the constructor(!) of) each object that wants to produce those events. Registering and unregistered of listeners (listener lifecycle), the threading model of the listeners and any listener error handling is managed separately from the producer. So your consumer just implements the interface it wants messages for, and your producer just calls methods on the injected instance of the listener interface. EventCast handles distributing the message to all of the currently registered listeners.

It is tightly integrated with Guice. It uses the Guice Whiteboard to store the EventCast listener implementation and make it available to the rest of the application. It detects additions of new listeners to the Guice Whiteboard and registers them for any produced events.

Recompilation Time and Layering

When following an iterative development methodology compilation time is a concern. In order to safely make a rapid series of small transformations to a program you need to regularly recompile and test the changes you are making. Using an IDE which can do incremental compilation is very helpful, but more complex programs with code and resource generation phases, when using languages without that kind of compilation support, and when verifying that the CI build will work, still require a fast full automated build and test cycle.

In a previous article I discussed a design principal that can be used to try to minimise the amount of time required to do a full compile of all the code in an application.

It is also possible to select design principals which will reduce the amount of recompilation required by any given code change. Conveniently it turns out that same abstract layering approach can be used here too.

We return to the example simple application;

application depending on library 1 depending on library 2 depending on library 3 depending on library 4

How long is required to compile if a change is made in each component, if each component takes “m” time to compile?

Component Compilation Order Compilation Time
library4 library4,library3,library2,library1,application 5m
library3 library3,library2,library1,application 4m
library2 library2,library1,application 3m
library1 library1,application 2m
application application 1m

If we calculate the instability of each module taking into account transitive dependencies. Abusing the definitions slightly we get:

instability = (Ce/(Ca+Ce))

Component Ce Ca I
library4 0 4 0
library3 1 3 0.25
library2 2 2 0.5
library1 3 1 0.75
application 4 0 1

Comparing the instability with the compilation time is interesting. Less stable packages should have lower compilation times, which we see here:

Component Time I
library4 5m 0
library3 4m 0.25
library2 3m 0.5
library1 2m 0.75
application 1m 1

Reducing Recompilation Time

We can restructure our application using the dependency inversion principal. We split each layer into an abstract (interface) layer and a concrete implementation layer.

The application module becomes responsible for selecting the concrete implementation of each layer and configuring the layers of the application. This design pattern is known as Inversion of Control.

application depends on libraries, but libraries depend on abstract layer interfaces

Taking into account the ability of a multi-core machine to compile modules in parallel, we get the following minimum compilation times:

Component Changed Compilation Order Compilation Time
library4 library4,application 2m
<<library4>> <<library4>>,{library4,library3},application 3m
library3 library3,application 2m
<<library3>> <<library3>>,{library3,library2},application 3m
library2 library2,application 2m
<<library2>> <<library2>>,{library2,library1},application 3m
library1 library1,application 2m
application application 1m

The compilation time for the abstract modules is uniformly 3m, and the compilation time for the concrete modules is uniformly 2m. The application itself is always the last to be compiled so is 1m as before.

How has the design change affected the stability of the modules?

Component Ce Ca I
library4 1 1 0.5
<<library4>> 0 3 0
library3 2 1 0.66
<<library3>> 0 3 0
library2 2 1 0.66
<<library2>> 0 3 0
library1 1 1 0.5
application 4 0 1

Again we see that the less stable a module is the lower its compilation time:

Component Time I
library4 2m 0.5
<<library4>> 4m 0
library3 2m 0.66
<<library3>> 4m 0
library2 2m 0.66
<<library2>> 4m 0
library1 2m 0.5
application 1m 1

This example is quite simple, so we see the instability metric of the modules being dominated by the application. It does illustrate however how we can use dependency inversion to help establish an upper bound on the recompilation time for any given change in a concrete module. I will examine a more realistic example in a future article.

Compilation Time and Layering

On a modern computer with multiple cores the total compile time of an application is related to the longest path though the dependency graph of the application.

In a statically type checked language that supports separate compilation of application components (into, for example jars or assemblies), it is generally necessary to compile the dependencies of an application or library before compiling the application or library itself.

A typical layered application might look like this:

application depending on library 1 depending on library 2 depending on library 3 depending on library 4

The compile order for such an application would be library4 then library3 then library2 then library1 then application. The longest path through the dependency graph is 5.

Assuming, for simplicity, that the compile time for each module is approximately the same (m) the total compile time for the application (t) is 5m.

t = 5m

Taking Advantage of Parallel Compilation

We can restructure our application using the dependency inversion principal. We split each layer into an abstract (interface) layer and a concrete implementation layer.

The application module becomes responsible for selecting the concrete implementation of each layer and configuring the layers of the application. This design pattern is known as Inversion of Control.

application depends on libraries, but libraries depend on abstract layer interfaces

The compile order for such an application is much more flexible than that of the earlier design allowing some modules to be compiled in parallel.

The longest path through the dependency graph is 3.

Assuming again, that the compile time for each module is approximately the same (m) the minimum compile time for the application (t) is 3m.

t = 3m

Dependency Inversion

What is a Dependency

For the purposes of this discussion a class A has a dependency on another class B, iff you cannot compile class A without class B.

Example

class A {B b;}
class B { }

Box A with an arrow pointing to box B

We have to compile class B before (or at the same time as) class A.

Other kinds of Dependency

Anywhere the name “B” appears in class A creates a dependency. Some other examples of dependencies are:

class A extends B { }
class A implements B { }
class A { void method(B b) { } }

Transitive Dependencies

If a class A depends on another class B which itself has dependencies then the dependencies of class B are effectively dependencies of class A

Example:

class A { B b; }
class B { C c; } 
class C { }

Box A with an arrow to Box B with an Arrow to Box C, dotted arrow from Box A to Box C

We have to compile class C before (or at the same time as) class B and class A.

The Problem

When class C changes, we have to recompile and retest both class A and class B. In a large system this can take a very long time. It also means that you have to know about class C in advance; you cannot decide on class C after deciding on class A.

If class C is a more concrete class then it might change more frequently than class A. This will cause class A to be recompiled/tested much more frequently than it otherwise would need to be.

The Solution

Inverting Dependencies

Have class A and class B both depend on an abstraction I. This inverts the direction of the dependency arrow on class B.

interface I { }
class A { I i; }
class B implements I { }

Box A with arrow to Box I, Box B with arrow to Box I

Breaks Transitive Dependency

The really helpful effect of this inversion is that it also breaks the transitive dependency from class A onto class B

interface I{ }
class A { I i; }
class B implements I{ C c; }
class C { }

Box A arrow to Box I, Box B arrow to Box I, Box B arrow to box C. Box A dotted arrow to Box C deleted

Dependency Inversion Principle

This is an application of the Dependency Inversion Principle:

  • High level modules should not depend on low level modules. Both should depend on abstractions.
  • Abstractions should not depend upon details. Details should depend upon abstractions.

Compositional Patterns for Test Driven Development

This article is going to look at how to implement a parameterizable algorithm so it can both conform generally to the open/closed principal and also most effectively be tested.

A common pattern for code reuse, implementation selection, or extension, is to use class inheritance and the template method pattern. This is an example of an implementation of the template method pattern with two different variations B and C:

abstract class A {
    public void doSomething() {
        doP();
        doQ();
        doR();
    }
 
    protected abstract void doP();
    protected abstract void doQ();
    protected abstract void doR();
}
 
class B extends A {
    @Override protected void doP() { /* do P the B way */}
    @Override protected void doQ() { /* do Q the B way */}
    @Override protected void doR() { /* do R the B way */}
}
 
class C extends A {
    @Override protected void doP() { /* do P the C way */}
    @Override protected void doQ() { /* do Q the C way */}
    @Override protected void doR() { /* do R the C way */}
}

Refactoring template method pattern to Strategy Pattern

We can always convert this template method pattern to a compositional pattern by performing a refactoring in the following steps (if you have member variable access there are a couple more steps, but I’ll cover them in a follow up article):

Step 1; encapsulate the construction of the B and C strategies:

abstract class A {
    public void doSomething() {
        doP();
        doQ();
        doR();
    }
 
    protected abstract void doP();
    protected abstract void doQ();
    protected abstract void doR();
}
 
class B extends A {
    public static A createB() {
        return new B();
    }
 
    @Override protected void doP() { /* do P the B way */}
    @Override protected void doQ() { /* do Q the B way */}
    @Override protected void doR() { /* do R the B way */}
}
 
class C extends A {
    public static A createC() {
        return new C();
    }
 
    @Override protected void doP() { /* do P the C way */}
    @Override protected void doQ() { /* do Q the C way */}
    @Override protected void doR() { /* do R the C way */}
}

Step 2; extract an interface for the strategy methods:

interface S {
    void doP();
    void doQ();
    void doR();
}
 
abstract class A implements S {
    private final S s = this;
 
    public void doSomething() {
        s.doP();
        s.doQ();
        s.doR();
    }
}
 
class B extends A {
    public static A createB() {
        return new B();
    }
 
    @Override public void doP() { /* do P the B way */}
    @Override public void doQ() { /* do Q the B way */}
    @Override public void doR() { /* do R the B way */}
}
 
class C extends A {
    public static A createC() {
        return new C();
    }
 
    @Override public void doP() { /* do P the C way */}
    @Override public void doQ() { /* do Q the C way */}
    @Override public void doR() { /* do R the C way */}
}

Step 3; pass the strategies into the superclass instead of using this:

interface S {
    void doP();
    void doQ();
    void doR();
}
 
final class A {
    private final S s;
 
    public A(final S s) {
        this.s = s;
    }
 
    public void doSomething() {
        s.doP();
        s.doQ();
        s.doR();
    }
}
 
class B implements S {
    public static A createB() {
        return new A(new B());
    }
 
    public void doP() { /* do P the B way */}
    public void doQ() { /* do Q the B way */}
    public void doR() { /* do R the B way */}
}
 
class C implements S {
    public static A createC() {
        return new A(new C());
    }
 
    public void doP() { /* do P the C way */}
    public void doQ() { /* do Q the C way */}
    public void doR() { /* do R the C way */}
}

Advantage of the compositional style

Less fragile

Changes to A such as new methods are much less likely to break the strategies in the compositional style.

Easier to Test

In the compositional style, class A can be tested by itself using Mocks. As can class B and class C. In the inheritance style class B and class C cannot be tested without also testing class A. This leads to duplication in the tests, as features of class A are re-tested for every subclass.

Easier to Reuse

In the compositional style class B and class C can be reused in other contexts where A is not relevant. In the inheritance style this type of reuse is not possible.

Emergent Model/Domain concepts

If class B or class C are reused in a different context, it may turn out that during subsequent refactoring they develop a new role within the system. I may even discover an important new domain concept.

Fail Fast – Fail During Compile

One of the fastest ways for faulty software to fail is at compile time.

Here is an example of slow failing code, it will fail at runtime if it comes across an implementation of A that it did not expect. This can happen if a developer adds a new implementation of A but forgets to update this code (or is not aware of it, or it is located in some other dependent library).

interface A { }
class B implements A { }
class C implements A { }
 
void doSomething(A a) {
   // not recommend
   if(a instanceof B) {
      doMyBThings((B) a);
   } else if(a instanceof C) {
      doMyCThings((C) a);
   } else {
      throw IllegalArgumentException("unexpected kind of A");
   }
}

One way to move these failures to compile time would be to move the responsibility for doing my B things and doing my C things on to the classes B and C respectively.

interface A { void doMyThings(); }
class B implements A { void doMyThings(){ /* do my B things */ } }
class C implements A { void doMyThings(){ /* do my C things */ } }
 
void doSomething(A a) {
   a.doMyThings();
}

If I add a new type D the compiler forces us to implement doMyThings()

class D implements A { void doMyThings(){ /* do my D things */ } }

Sometimes it is not appropriate to put the responsibility to doMyThings() onto A. It might create undesirable coupling (a dependency) or have some other undesirable property. I can maintain the fail fast property in in other ways.

interface A { void doThings(Things things); }
class B implements A { void doThings(Things things){ things.doBThings(this); } }
class C implements A { void doThings(Things things){ things.doCThings(this); } }
 
interface Things {
      void doBThings(B b);
      void doCThings(C c);
}
 
void doSomething(A a) {
   a.doThings(new Things(){
      void doBThings(B b){ /* do my B things */}
      void doCThings(C c){ /* do my C things */}
   });
}

Adding a new type D, the compiler forces me to implement doThings on D, which leads me to add a new method onto the Things interface, which forces any implementers of Things to handle D types.

class D implements A { void doThings(Things things){ things.doDThings(this); } }
 
interface Things {
      void doBThings(B b);
      void doCThings(C c);
      void doDThings(D d);
}
 
void doSomething(A a) {
   a.doThings(new Things(){
      void doBThings(B b){ /* do my B things */}
      void doCThings(C c){ /* do my C things */}
      void doDThings(D d){ /* do my D things */}
   });
}

However, a user might not want to handle every new implementation of A, I can provide default implementations of the methods:

class BaseThings implements Things {
      void doBThings(B b) { };
      void doCThings(C c) { };
}
 
void doSomething(A a) {
   a.doThings(new BaseThings(){
      void doBThings(B b){ /* do my B things */}
      void doCThings(C c){ /* do my C things */}
   });
}

When I add D I also add a default implementation, so I do not have to add any new handling code:

class BaseThings implements Things {
      void doBThings(B b) { };
      void doCThings(C c) { };
      void doDThings(D d) { };
}
 
void doSomething(A a) {
   a.doThings(new BaseThings(){
      void doBThings(B b){ /* do my B things */}
      void doCThings(C c){ /* do my C things */}
   });
}

I can choose to either have my code break when new implementations of A are added (by extending Things) or not to break (by extending BaseThings).