Separating Concerns – Part 2: Services

In the previous article on Separation of Concerns, libraries were explained as a way to decompose an application into separate sets of functions, resulting in code that is easier to maintain and has higher cohesion. This article continues the subject, explaining how applications can benefit from using services and what differentiates them from libraries.

Services

A service is a set of related functions usable by multiple applications and accessible through one or more public interfaces. What differentiates a service from a library is the way an application uses it. In the case of a library, functions run in the execution context of the calling application, inheriting the application’s environment, which includes the process, thread, and user context. A service, however, runs in its own execution context and has its own set of policies that govern how it can be used.

Due to framework limitations and/or the complexity of asynchronous programming, many applications invoke services synchronously — leaving the developer with the impression that the service has “taken control” of the execution context. In reality, the calling application is only blocked while waiting for the service to respond. And since the application is blocked, it behaves like a library function call, at least from the perspective of the debugger. Service invocation, however, invokes a complex exchange of information between the calling application’s process and the service’s process. This exchange is even further complicated by the fact that the application and service processes may exist on separate machines, be written using different languages, and be running on different operating systems. Also, it’s important to recognize that at any time during service invocation, the operating system, the process, the service, or the network can (and eventually will) fail.

Service Interface

The blocking scenario above pertains to applications that invoke services using a remote procedure call (or, RPC) form of inter-process communication. Examples include a web browser using a REST implementation over HTTP, a mobile client using a SOAP implementation (also over HTTP), and an intranet application using a binary implementation over TCP. A single service can support one or more of these implementations simultaneously, making the service available to a broader range of applications.

Another form of inter-process communication is message passing, where the application sends a message to the service. Message passing is inherently asynchronous, the application can send the message and continue processing without waiting for the service to complete. There are many advantages to using asynchronous message passing instead of synchronous RPC, but an asynchronous programming model is also more complex for developers. Messages can also be passed using a durable message store, making it easier to recover from failure without losing service requests and/or commands. Message passing also eliminates temporal coupling, preventing the application from being dependent upon the availability of the service.

And the advantage is… what?

Given the complexity of a service, particularly in comparison to a library, why would anyone want to deal with the complexity of a service?

Quite simply, with a service, the implementation details of the public interface are encapsulated within the service itself and do not become dependencies of the calling application. And since the application is only calling the public interface, the only dependency added to the application is the service interface. The application does not inherit the dependencies of the service, as those dependencies are private implementation details. For this reason alone, when a function has a dependency on another service or when a function depends upon dynamic data, it is better to create a service, encapsulating those dependencies and enabling the service to be managed separately from the applications using it.

For example, a domain name validation function requires a list of valid domain names. However, the current list of valid domain names is constantly changing. If domain name validation was implemented as a library, the application must also be responsible for maintaining the list of valid domain names. Rather than adding these additional requirements to the application, a service is used to valid the domain name instead.

So, the advantage of encapsulating dependencies while retaining the ability to reuse functionality is the key benefit of a service. The benefits of a library are also benefits of a service, including high cohesion, as long as the service focuses on a single concern or responsibility.

In the next installment, frameworks will be explained.

CRUD is Not a Service

Introduction

Many systems implement CRUD (create, read, update, and delete) using a repository pattern. An entity is loaded using a Get method, some business layer logic makes changes to the entity, and ultimately saved using a Put method. This exact pattern is replicated with as many names as there are examples, including EntityManager, EntityDataContext, EntityStorage, etc. In fact, the pattern itself has been completely generalized by libraries such as NHibernate, which provides an ISession interface for performing simple CRUD operations (yes, there are many additional advanced features that make NHibernate much more useful than just a simple CRUD library, but that’s not the point).

A significant weakness of the repository pattern is ensuring that an entity’s state is valid. When an entity is saved, it is important that the contents are valid. If the contents are invalid, such as the order total not equaling the sum of the order items, the resulting inconsistency can spread throughout the application causing additional, and perhaps greater inconsistencies.

Most of the time, developers using the repository pattern define classes for entities with properties for the attributes of the entity. And in most cases, the properties allow both reads and writes – making it possible for entity to become invalid. The repository pattern also does not allow intent to be either expressed or captured as changes are made.

In order to properly validate an entity, the validation logic may need access to additional resources. For example, adding an activity to an itinerary may need to verify that there are seats available for a dining activity. Beyond simple validation, adding the activity to an itinerary may need to allocate an available seat to the itinerary’s owner. Subsequently, removing the activity would require that the previously allocated seat be released so that is available to others.

As systems evolve, this type of behavior gets added to the repository class, checking for property changes in the Put method and invoking services when changes are found. The more property changes that require external systems to be notified, the more complex the Put method becomes resulting in a greater number of dependencies in the repository class.

Don’t even ask what happens when the entity is saved and the property changes, having invoked external services are not persisted due to a transaction timeout or deadlock in the storage system. And don’t simply suggest that invoking the services after the save is complete is the right answer, because then what happens when the services cannot be invoked due to a network or availability issue.

Command Query Separation

Command Query Separation (or CQS) is a pattern that separates commands (typically write operations) from queries (including reads and operations that do not change data). By separating the concerns of reading and writing, it is possible to tune a system to meet specific scaling requirements without generalizing for both operations.

The following provides an example of how CQS can be used to implement CRUD operations. With each operation, an approach is presented that allows for separation of concerns as well as an implementation that can scale reads and writes separately.

Create

Consider, for example, a dining reservation system for a local restaurant. The Reservation service exposes an API to schedule a reservation where the client specifies the date, number of guests, and the desired seating time. When called, the service checks availability and either adds the reservation to the calendar, or fails and notifies the caller that the requested time is not available. If the reservation is added, a table is reserved for the guests and the table is removed from the available seating. Once all available seating is reserved, the service will no longer accept reservations.

The scheduling API above is an example of a command. A command tells a service to perform an operation. The service is responsible for implementing the command’s behavior, and is also the ultimate authority as to whether the command can be completed.

From the perspective of the command’s initiator, the contract is well defined. Submit the required arguments (date, time, and the number of guests), and observe one of the two possible outcomes (scheduled, or unavailable). As long as there are available seats at the requested time, the reservation should succeed. If the command fails due to lack of availability, the initiator can choose to adjust the arguments (such as requesting a later time, or selecting a different date) and resubmit the command, or it can decide instead to try another time to check if an opening becomes available.

Read

In order to give the initiator a chance of successfully scheduling a reservation, it’s important that the reservation systems constraints are available so that initiators are able to submit reservations that will be accepted. This can be done many ways, but one way would be to expose the availability through a separate service.

For example, an application may display the restaurant’s availability to the user so that the user can select a time. At the minimum, having access to the restaurant’s days and hours of operation would allow the user to know when the restaurant is open. However, the restaurant may only take reservations in the evening and on weekends. To convey this information to the application and the user, the availability service may supply more detailed availability including ranges of time and the seating availability for each range.

The additional information provided by the availability service enables the application to determine in advance if a reservation will be accepted. If there is no seating available at a particular date and time, the application can assume that submitting a reservation for the date and time will fail. The application is not prevented from submitting the reservation, but it is likely that the reservation will fail.

Update

Plans change, and likewise the reservation service need to be able to make changes to reservations. Depending upon the type of change, however, the service needs to follow different behaviors.

For example, if the reservation time changes, the service would need to determine if there was sufficient capacity at the new time for the reservation. On the other hand, if the number of guests increased, the service would need to ensure there was either sufficient seating at the already assigned table or if a larger table was available at the same time. A simple change, such as updating the name on the reservation, might not require any checks at all – unless the new name is identified as a VIP, in which case a check for upgraded tables or perhaps special menu selections would be performed to ensure that the VIP is treated to the best possible service.

As the above examples clearly show, an update is not just a save operation. An update is a general term given to a wide variety of changes that may be applied to a reservation. Since the behavior of each change is different, each change should also be handled differently. A command should be created to define the contract for each change, and each command should be explicitly named to describe the change (UpdateReservationName, ChangeReservationGuests, ChangeReservationTime).

While the update has changed from a single “write my object” operation to three separate commands, it is now easier to reason about the behavior of each command. If a new reservation time is requested, the initiator could check the published availability information and predetermine if the time slot is available. The initiator is not prevented from sending the command based on this information, but the likelihood of success is greater.

Aggregate Roots and Scoping

An aggregate root is a form of transactional boundary (defined in Domain Driven Design by Eric Evans) which defines the scope of an operation and its related data). For example, if the reservation service managed a list of guests with each reservation, the reservation would be the aggregate root and the list of guests would be contained within the reservation. This means that the addition or removal of a guest would be performed by or with the aggregate root. In practice, such as with a relational database, adding a guest to the reservation would not involve simply inserting into a ReservationGuest table, but actually loading the reservation and adding a guest. The reservation is the root entity, and the guests are a related or child entity.

The reason for this is that a reservation should be treated as a whole and not a set of related entities. If the system has a rule that a reservation cannot exceed eight guests, and guests are arbitrarily added outside of the reservation, the logic to validate the number of guests ends up in multiple places (just read this as cut-n-paste, which makes it quite obvious why it is a bad thing). Keeping the validation logic as part of the reservation makes the rules easier to discover and understand compared to having validation logic spread across the service.

Delete

Continuing with the example, it’s likely that a guest may cancel a reservation. Plans change, and the service should be able to remove a reservation that is no longer required.

To support canceling a reservation, the service may provide an additional API to cancel a reservation using the reservation number. When the command is received, the service would look up the reservation. If the reservation exists, the reservation would be marked as canceled and removed from the schedule – making the table available for scheduling by other patrons. If the reservation does not exist, the command would fail but the failure does not have any other effects. If the reservation existed but was already canceled, the command could be acknowledged as already canceled (there is no need to cancel a canceled reservation, but not failing ensures that the command is processed idempotently).

The fact that the reservation existed does not change, so it is important that the history of the reservation is retained. While the service could simply delete the reservation from storage, the stakeholders may want to keep a history of reservations for future use, such as marketing or promotional events, or to follow up to solicit feedback as to why the reservation was canceled.

Auditing

When a command is executed, such as adding an activity to an itinerary, it is important to retain an audit trail of changes. This audit trail is important in case the contents of the itinerary are disputed. For example, a customer may argue that they did not add a particular activity or that an activity is missing. Without an audit trail, it would be impossible to determine the contents of an itinerary at a previous point in time or who made any changes to the itinerary.

Retaining a history of commands executed on the itinerary along with preventing itinerary changes outside of the available commands can provide a reliable audit trail should the need arise. Additionally, ensuring that each command includes the user who initiated the command along with timestamps indicating when the command was initiated and executed can provide a chronological view of the changes made to the entity.

To summarize a statement commonly made by Greg Young, “So you have an audit trail, how do you know it’s right?”

By retaining every successful command, it is possible to rebuild the state of a reservation. In fact, in an event-sourced model, the actual commands are used to determine the current state. There are use cases for each approach, so if you have a highly event-based model, event sourcing may be worth consideration.

Bonus: Transferring Data Between Systems

In many organizations, separate test and production systems are used so that integrators and developers can test software or configuration changes prior to deploying them on production. For example, an integrator may configure a new customer on the test system prior to moving that configuration into production. More often than not, this transfer is performed using simple CRUD operations – typically behind the facade of an “import/export” link.

A disadvantage of using bulk CRUD operations when transferring configuration between systems is that the system itself is not a participant in the data import process.

Using Commands to Transfer Data

Rather than transfer data at the entity level, the data in the source system should be used to generate a sequence of commands that can be executed on the target system. Those commands could include references to the original commands executed on the source system, along with the time those commands were originally executed and the initiating user details. Retaining this information may be crucial as changes are deployed, ensuring that the user performing the transfer is not made responsible for the actual changes performed by another user.

Conclusion

The use of commands to perform the creation, updating, and deleting of data has clear advantages over simple data access layer operations. Change tracking, auditing, and validation are critical to ensure that data is valid. As with most technical choices, whether or not this level of validation is required depends upon your requirements. in my experience, more often than not, the level of detail is required as auditing and change tracking eventually makes its way into the backlog.

Implementing Routing Slip with MassTransit

This article introduces MassTransit.Courier, a new project that implements the routing slip pattern on top of MassTransit, a free, open-source, and lightweight message bus for the .NET platform.

Introduction

When sagas were originally conceived in MassTransit, they were inspired by an excerpt from Chapter 5 in the book SOA Patterns by Arnon Rotem-Gal Oz. Over the past few months, the community has argued discussed how the use of the word saga has led to confusion and how early implementations included in both NServiceBus and MassTransit do not actually align with the original paper published in 1987 by Princeton University and written by Hector Garcia-Molina and Kenneth Salem in which the term was coined.

With MassTransit Courier, the intent is to provide a mechanism for creating and executing distributed transactions with fault compensation that can be used alongside the existing MassTransit sagas for monitoring and recovery.

Background

Over the past few years building distributed systems using MassTransit, a pattern I consistently see repeated is the orchestration of multiple services into a single business transaction. Using the existing MassTransit saga support to manage the state of the transaction, the actual processing steps are created as autonomous services that are invoked by the saga using command messages. Command completion is observed using an event or response message by the saga, at which point the next processing step is invoked. When the saga has invoked the final service the business transaction is complete.

As the processing required within a business transaction changes with evolving business requirements, a new version of the saga is required that includes the newly created processing steps. Knowledge of the new services becomes part of the saga, as well as the logic to identify which services need to be invoked for each transaction. The saga becomes rich with knowledge, and with great knowledge comes great responsibility (after all, knowledge is power right?). Now, instead of only orchestrating the transaction, the saga is responsible for identifying which services to invoke based on the content of the transaction. Another concern was the level of database contention on the saga tables. With every service invocation being initiated by the saga, combined with the saga observing service events and responses, the saga tables gets very busy.

Beyond the complexity of increasing saga responsibilities, more recently the business has requested the ability to selectively route a message through a series of services based on the content of the message. In addition to being able to dynamically route messages, the business needs to allow new services to be created and added to the inventory of available services. And this should be possible without modifying a central control point that dispatches messages to each service.

Like most things in computer science, this problem has already been solved.

The Routing Slip Pattern

A routing slip specifies a sequence of processing steps for a message. As each processing step completes, the routing slip is forwarded to the next step. When all the processing steps have completed, the routing slip is complete.

A key advantage to using a routing slip is it allows the processing steps to vary for each message. Depending upon the content of the message, the routing slip creator can selectively add processing steps to the routing slip. This dynamic behavior is in contrast to a more explicit behavior defined by a state machine or sequential workflow that is statically defined (either through the use of code, a DSL, or something like Windows Workflow).

MassTransit Courier

MassTransit Courier is a framework that implements the routing slip pattern. Leveraging a durable messaging transport and the advanced saga features of MassTransit, MassTransit Courier provides a powerful set of components to simplify the use of routing slips in distributed applications. Combining the routing slip pattern with a state machine such as Automatonymous results in a reliable, recoverable, and supportable approach for coordinating and monitoring message processing across multiple services.

In addition to the basic routing slip pattern, MassTransit Courier also supports compensations which allow processing steps to store process-related data so that reversible operations can be undone, using either a traditional rollback mechanism or by applying an offsetting operation. For example, a processing step that holds a seat for a patron could release the held seat when compensated.

MassTransit Courier is free software and is covered by the same open source license as MassTransit (Apache 2.0). You can install MassTransit.Courier into your existing solution using NuGet.

Activities

In MassTransit Courier, an Activity refers to a processing step that can be added to a routing slip. To create an activity, create a class that implements the Activity interface.

public class DownloadImageActivity :
    Activity<DownloadImageArguments, DownloadImageLog>
{
}

The Activity interface is generic with two arguments. The first argument specifies the activity’s input type and the second argument specifies the activity’s log type. In the example shown above, DownloadImageArguments is the input type and DownloadImageLog is the log type. Both arguments must be interface types so that the implementations can be dynamically created.

Implementing an Activity

An activity must implement two interface methods, Execute and Compensate. The Execute method is called while the routing slip is executing activities and the Compensate method is called when a routing slip faults and needs to be compensated.

When the Execute method is called, an execution argument is passed containing the activity arguments, the routing slip TrackingNumber, and methods to mark the activity as completed or faulted. The actual routing slip message, as well as any details of the underlying infrastructure, are excluded from the execution argument to prevent coupling between the activity and the implementation. An example Execute method is shown below.

ExecutionResult Execute(Execution<DownloadImageArguments> execution)
{
    DownloadImageArguments args = execution.Arguments;
    string imageSavePath = Path.Combine(args.WorkPath, 
        execution.TrackingNumber.ToString());

    _httpClient.GetAndSave(args.ImageUri, imageSavePath);

    return execution.Completed(new DownloadImageLogImpl(imageSavePath));
}

Once activity processing is complete, the activity returns an ExecutionResult to the host. If the activity executes successfully, the activity can elect to store compensation data in an activity log which is passed to the Completed method on the execution argument. If the activity chooses not to store any compensation data, the activity log argument is not required. In addition to compensation data, the activity can add or modify variables stored in the routing slip for use by subsequent activities.

In the example above, the activity creates an instance of a private class that implements the DownloadImageLog interface and stores the log information in the object properties. The object is then passed to the Completed method for storage in the routing slip before sending the routing slip to the next activity.

When an activity fails, the Compensate method is called for previously executed activities in the routing slip that stored compensation data. If an activity does not store any compensation data, the Compensate method is never called. The compensation method for the example above is shown below.

CompensationResult Compensate(Compensation<DownloadImageLog> compensation)
{
    DownloadImageLog log = compensation.Log;
    File.Delete(log.ImageSavePath);

    return compensation.Compensated();
}

Using the activity log data, the activity compensates by removing the downloaded image from the work directory. Once the activity has compensated the previous execution, it returns a CompensationResult by calling the Compensated method. If the compensating actions could not be performed (either via logic or an exception) and the inability to compensate results in a failure state, the Failed method can be used instead, optionally specifying an Exception.

Building a Routing Slip

Developers are discouraged from directly implementing the RoutingSlip message type and should instead use a RoutingSlipBuilder to create a routing slip. The RoutingSlipBuilder encapsulates the creation of the routing slip and includes methods to add activities, activity logs, and variables to the routing slip. For example, to create a routing slip with two activities and an additional variable, a developer would write:

var builder = new RoutingSlipBuilder(NewId.NextGuid());
builder.AddActivity(“DownloadImage”, “rabbitmq://localhost/execute_downloadimage”, new
    {
        ImageUri = new Uri(“http://images.google.com/someImage.jpg”)
    });
builder.AddActivity(“FilterImage”, “rabbitmq://localhost/execute_filterimage”);
builder.AddVariable(“WorkPath”, “\\dfs\work”);

var routingSlip = builder.Build();

Each activity requires a name for display purposes and a URI specifying the execution address. The execution address is where the routing slip should be sent to execute the activity. For each activity, arguments can be specified that are stored and presented to the activity via the activity arguments interface type specify by the first argument of the Activity interface. The activities added to the routing slip are combined into an Itinerary, which is the list of activities to be executed, and stored in the routing slip.

Managing the inventory of available activities, as well as their names and execution addresses, is the responsibility of the application and is not part of the MassTransit Courier. Since activities are application specific, and the business logic to determine which activities to execute and in what order is part of the application domain, the details are left to the application developer.

Once built, the routing slip is executed, which sends it to the first activity’s execute URI. To make it easy and to ensure that source information is included, an extension method to IServiceBus is available, the usage of which is shown below.

bus.Execute(routingSlip); // pretty exciting, eh?

It should be pointed out that if the URI for the first activity is invalid or cannot be reached, an exception will be thrown by the Execute method.

Hosting Activities in MassTransit

To host an activity in a MassTransit service bus instance, the configuration namespace has been extended to include two additional subscription methods (thanks to the power of extension methods and a flexible configuration syntax, no changes to MassTransit were required). Shown below is the configuration used to host an activity.

var executeUri = new Uri(“rabbitmq://localhost/execute_example”);
var compensateUri = new Uri(“rabbitmq://localhost/compensate_example”);

IServiceBus compensateBus = ServiceBusFactory.New(x =>
    {
        x.ReceiveFrom(compensateUri);
        x.Subscribe(s => s.CompensateActivityHost<ExampleActivity, ExampleLog>(
            _ => new ExampleActivity());
    });

IServiceBus executeBus = ServiceBusFactory.New(x =>
    {
        x.ReceiveFrom(executeUri);
        x.Subscribe(s => s.ExecuteActivityHost<ExampleActivity, ExampleArguments>(
            compensateUri,
             _ => new ExampleActivity());
    });

In the above example two service bus instances are created, each with their own input queue. For execution, the routing slip is sent to the execution URI, and for compensation the routing slip is sent to the compensation URI. The actual URIs used are up to the application developer, the example merely shows the recommended approach so that the two addresses are easily distinguished. The URIs must be different!

Monitoring Routing Slips

During routing slip execution, events are published when the routing slip completes or faults. Every event message includes the TrackingNumber as well as a Timestamp (in UTC, of course) indicating when the event occurred:

  • RoutingSlipCompleted
  • RoutingSlipFaulted
  • RoutingSlipCompensationFailed

Additional events are published for each activity, including:

  • RoutingSlipActivityCompleted
  • RoutingSlipActivityFaulted
  • RoutingSlipActivityCompensated
  • RoutingSlipActivityCompensationFailed

By observing these events, an application can monitor and track the state of a routing slip. To maintain the current state, an Automatonymous state machine could be created. To maintain history, events could be stored in a database and then queried using the TrackingNumber of the RoutingSlip.

Wrapping Up

MassTransit Courier is a great way to compose dynamic processing steps into a routing slip that can be executed, monitored, and compensated in the event of a fault. When used in combination with the existing saga features of MassTransit, it is possible to coordinate a distributed set of services into a reliable and supportable system.

IDisposable, Done Right

IDisposable is a standard interface in the .NET framework that facilitates the deterministic release of unmanaged resources. Since the Command Language Runtime (CLR) uses Garbage Collection (GC) to manage the lifecycle of objects created on the heap, it is not possible to control the release and recovery of heap objects. While there are methods to force the GC to collect unreferenced objects, it is not guaranteed to clear all objects, and it is highly inefficient for an application to force garbage collection as part of the service control flow.

Implementing IDisposable

Despite IDisposable having only a single method named Dispose to implement, it is commonly implemented incorrectly. After reading this blog post it should be clear how and when to implement IDisposable, as well as how to ensure that resources are properly disposed when bad things happen (also knows as exceptions).

First, the IDisposable interface definition:

public interface IDisposable
{
    void Dispose();
}

Next, the proper way to implement IDisposable every single time it is implemented:

public class DisposableClass :
    IDisposable
{
    bool _disposed;

    public void Dispose()
    {
        Dispose(true);
        GC.SuppressFinalize(this);
    }

    ~DisposableClass()
    {
        Dispose(false);
    }

    protected virtual void Dispose(bool disposing)
    {
        if (_disposed)
            return;

        if (disposing)
        {
            // free other managed objects that implement
            // IDisposable only
        }

        // release any unmanaged objects
        // set thick object references to null

        _disposed = true;
    }
}

The pattern above for implementing IDisposable ensures that all references are properly disposed and released. Using the finalizer, along with the associated dispose methods, will ensure that in every case references will be properly released. There are some subtle things going on in the code, however, as described below.

Dispose()

The implementation of the Dispose method calls the Dispose(bool disposing) method, passing true, which indicates that the object is being disposed. This method is never automatically called by the CLR, it is only called explicitly by the owner of the object (which in some cases may be another framework, such as ASP.NET or MassTransit, or an object container, such as Autofac or StructureMap).

~DisposableClass

Immediately before the GC releases an object instance, it calls the object’s finalizer. Since an object’s finalizer is only called by the GC, and the GC only calls an objects finalizer when there are no other references to the object, it is clear that the Dispose method will never be called on the object. In this case, the object should release any managed or unmanaged references, allowing the GC to release those objects as well. Since the same object references are being released as those that are released when Dispose is called, this method calls the Dispose(bool disposing) method passing false, indicating that the references objects Dispose method should not be called.

Dispose(bool)

All object references and unmanaged resources are released in this method. However, the argument indicates whether or not the Dispose method should be called on any managed object references. If the argument is false, the references to managed objects that implement IDisposable should be set to null, however, the Dispose method on those objects should not be called. The reason being that the owning objects Dispose method was not called (Dispose(false) is only called by the finalizer, and not the Dispose method.

Overriding Dispose

In the example above, the Dispose(bool disposing) method is declared as protected virtual. This is to allow classes that inherit from this class to participate in the disposable of the object without impacting the behavior of the base class. In this case, a subclass should override the method as shown below.

public class SubDisposableClass : 
    DisposableClass
{
    private bool _disposed;

    // a finalizer is not necessary, as it is inherited from
    // the base class

    protected override void Dispose(bool disposing)
    {
        if (!_disposed)
        {
            if (disposing)
            {
                // free other managed objects that implement
                // IDisposable only
            }

            // release any unmanaged objects
            // set thick object references to null

            _disposed = true;
        }

        base.Dispose(disposing);
    }
}

The subclass overrides the method, releasing (and optionally disposing) object references first, and then calling the base method. This ensures that objects are released in the proper order (at least between the subclass and the base class, the proper order of releasing/disposing objects within the subclass itself is the responsibility of the developer).

Exceptions, Happen

Prior to .NET 2.0, if an object’s finalizer threw an exception, that exception was swallowed by the runtime. Since .NET 2.0, however, throwing an exception from a finalizer will cause the application to crash, and that’s bad. Therefore, it is important that a finalizer never throw an exception.

But what about the Dispose method, should it be allowed to throw an exception? The short answer, is no. Except when the answer is yes, which is almost never. Therefore, it is important to wrap any areas of the Dispose(bool disposing) method that could throw an exception in a try/catch block as shown below.

protected virtual void Dispose(bool disposing)
{
    if (_disposed)
        return;

    if (disposing)
    {
        _session.Dispose();
    }

    try
    {
        _channelFactory.Close();
    }
    catch (Exception ex)
    {
        _log.Warn(ex);

        try
        {
            _channelFactory.Abort();
        }
        catch (Exception cex)
        {
            _log.Warn(cex);
        }
    }

    _session = null;
    _channelFactory = null;

    _disposed = true;
}

In the example, session is a reference to an NHibernate ISession and channelFactory is a reference to a WCF IChannelFactory. An NHibernate ISession implements IDisposable, so the owner must call Dispose on it when the object is no longer needed. In the case of the IChannelFactory reference, there is no Dispose method, however, the object must be closed (and subsequently aborted in case of an exception). Because either of these methods can throw an exception, it is important to catch the exception (and, as shown above, log it for troubleshooting or perhaps just ignore it) so that it doesn’t cause either the Dispose method or the object’s finalizer to propagate the exception.

Constructor Exceptions

On a related topic, when an object’s constructor throws an exception, the runtime considers the object to have never existed. And while the GC will release any object allocated by the constructor, it will not call the Dispose method on any disposable objects. Therefore, if an object is creating references to managed objects in the constructor (or even more importantly, unmanaged objects that consume limited system resources, such as file handles, socket handles, or threads), it should be sure to dispose of those resources in the case of a constructor exception by using a try/catch block.

While one might be tempted to call _Dispose_ from the constructor to handle an exception, don’t do it. When the constructor throws an exception, technically the object does not exist. Calling methods, particularly virtual methods, should be avoided.

Of course, in the case of managed objects such as an ISession, it is better to take the object as a dependency on the constructor and have it passed into the object by an object factory (such as a dependency injection container, such as Autofac) and let the object factory manage the lifecycle of the dependency.

Container Lifecycle Management

Dependency injection containers are powerful tools, handling object creation and lifecycle management on behalf of the developer. However, it is important to have a clear understanding of how to use the container in the context of an application framework.

For example, ASP.NET has a request lifecycle for every HTTP request received by the server. To support this lifecycle, containers typically have integration libraries that hook into the framework to ensure proper object disposal. For instance, Autofac has a number of integration libraries for ASP.NET, ASP.NET MVC, ASP.NET Web API, and various other application frameworks. These libraries, when configured into the stack as HttpModules, ensure that objects are properly disposed when each request completes.

Conclusion

The reason for IDisposable is deterministic release of references by an object (something that used to happen manually with unmanaged languages by calling delete on an object). Implementing it both properly and consistently helps create applications that have predictable resource usage and more easy to troubleshoot. Therefore, consider the example above as a reference point for how objects should be disposed.

References:
- Autofac Web Integration
- Microsoft Documentation

Bonus:
- Resharper Template

Separating Concerns – Part 1: Libraries

Introduction

In large applications, particularly in enterprise applications, separation of concerns is critical to ease maintainability. Without proper separation of concerns, applications become too large and too complex, which in turn makes maintenance and enhancement extremely difficult. Separating application concerns leads to high cohesion, allowing developers to better understand code behavior which leads to easier code maintenance.

History

In the previous decade, architects designed applications using an n-tier approach, separating the application into horizontal layers such as user interface, business logic, and data access. This approach is incomplete, however, as it fails to address partitioning applications vertically. Unrelated concerns are commingled, resulting in a confusing architecture which lacks clearly defined boundaries and has low cohesion.

The other problem with an n-tier architecture is how it is organized from top to bottom, with the topmost layer being the presentation layer or user interface, and the bottommost layer representing the persistence layer or database. Instead of thinking of the architecture as horizontal layers, think of them as rings, as described by the Onion Architecture described by Jeffrey Palermo. (While Jeffrey proposed the pattern name, the architectural patterns have been defined previously by others.)

Separating Concerns

Given that a separation of concerns and increasing cohesion are the goals, there are several mechanisms towards achieving them. The solutions that follow include the use of libraries, services, and frameworks as ways to reach these goals.

The Library

A library is a set of functions used to build software applications. Rather than requiring an application to be a single project containing every source file, most programming languages provide a means to segregate functionality into libraries. While the facility name varies, a partial list of which includes package, module, gem, jar, and assembly, the result is enabling developers to separate functions physically from the main application project, improving both cohesion and maintainability.

Core, the new Manager

A library should not be a collection of unrelated functions, it should contain related functions so that it is highly cohesive. An application developer should be able to select a library for use based on its name and purpose, rather than having to pour through the source code to find the function or functions needed. A library should have a descriptive name and contain a cohesive set of functions towards a singular purpose or responsibility.

Creating a library named Core containing a large set of unrelated functions is separation of the sake of separation, and that library should not be treated as a library but as part of the application — it should not be reused by other applications.

Coupling (aka, the Path of Pain)

When an industry analyst shares their observations about code reuse in the enterprise, the findings indicate that actual code reuse is very low. A main reason that code reuse is so low is tight coupling. Coupling refers to how two libraries (or functions) rely on each other. When a library relies upon another library, the library relied on is referred to as a dependency. When an application relies on a library, it implicitly relies on the library’s dependencies as well. In many larger applications, this can lead straight to dependency hell.

Since tight coupling can lead to serious maintenance issues during an application’s lifecycle, limiting dependencies should be first and foremost in application and library design. If a function is to be moved from an application to a library, and that function must bring with it a dependency that was not previously required by the target library, the cost of adding the new dependency to the library must be considered. Too often, particularly in the enterprise where code is only reviewed internally by a single development team, poor choices are made when creating libraries. Functions are routinely moved out of the main project and placed into arbitrary libraries with little thought given to the additional dependencies of the library.

An Example

As an example, a web application has a set of functions for validating email addresses. The simplest validation methods may only depend upon regular expression functions, which are part of every modern language runtime used today. A more complete validation of an email address may check that the domain is actually valid and has a properly registered MX record in DNS. However, validating the domain involves sending a request to a service and waiting for the response indicating a valid domain before the email address is determined to be valid.

There are many things wrong in this example. First, the email validation function has a dependency on a domain validation function. Due to the fact that the set of valid domains is continuously changing, the domain validation function itself has a dependency on a domain name service. Of course, the domain name service depends upon a network domain name service, which may subsequently depend upon an internet service as well. By calling one library function, the application has managed to send a request to another machine and block a thread waiting for a response.

In the case of an error, the disposition of the email address is then unknown. Is it a valid email address that could not be validated due to a network error? Or is it a valid email address but flagged as invalid because the domain name could not be validated due to an internal DNS server not allowing external domains to be returned?

The coupling in the email validation library is clearly a problem, but what happens as the business requirements evolve over the life of the application? Consider the situation where new accounts are being created by spammers from other countries. To combat the spam accounts, email addresses must now be validated to ensure that the IP address originates from within the United States. The email validation function now has a new dependency, a geolocation service that returns the physical address of a domain. However, the service requires the use of separate endpoints for testing and production. The email address validation function is now dependent upon two services and configuration data to determine which service endpoint to use.

At this point, it is obvious that the complexity of validating an email address is not something that can be accomplished in a library function.

This article will continue with Part 2 on services.

Tulsa TechFest 2012 Code

Here is the code from my talk at Tulsa TechFest on SignalR. Thanks to those of you who came to the talk, I hope you learned enough about SignalR to determine if it’s the right technology for you. Be sure you have enabled NuGet to restore packages on build so the required references all get downloaded and installed from the NuGet site.

TulsaTechFest2012.zip

 

StrangeLoop 2012

This past weekend I attended my 2nd StrangeLoop conference. StrangeLoop is an annual conference held in St. Louis, MO and for the last four years it has managed to draw some impressive talent. Unlike other events I attend, StrangeLoop is an independent conference and is not dominated by a single platform, technology, or language. The quality and level of content is also high, making StrangeLoop a place where introductory sessions are frowned upon — attendees want deep, intriguing sessions where experienced practitioners can learn new things. Attendees at StrangeLoop are commonly pushing the leading edge, and the session topics are state of the art, sometimes on the edge of redefining software development in the coming years.

So how was it?

Day 1

Opening Keynote: VoltDB, Michael Stonebraker

In the first thirty minutes, I had a strong sense that the conference was off to a rough start. In what was clearly a product-focused talk, the VoltDB CTO made a weak case for ACID, eliciting frequent groans from the audience. Make no mistake, Stonebraker is a really smart guy, but too much of his time was spent bashing other databases (if you can technically call eventually consistent storage systems without a query language databases). As an opening keynote for the conference, this was the worst possible choice. Now, I have followed VoltDB since the early bits, and was impressed with the lock-free approach that serializes all operations, but this talk didn’t spend enough time on the benefits of VoltDB.

Get a Leg Up with Twitter Bootstrap

For the first actual session of the day, Howard Lewis Ship took the audience on a tour of Twitter Bootstrap, which is rapidly becoming the File, New Web Site project template. In fact, I was glad to see that entire gallery of customized Bootstrap templates — hopefully now all Bootstrap originated sites won’t all look the same. I’m a fan of Bootstrap, and this was a solid introduction, but myself (and the rest of the audience I’m sure) was hoping for a bit more depth.

Software Architecture using ZeroMQ

My expectations were high on this session, and I was really hoping to get some insight into 0MQ, and how to build systems using it as the authors intended. While Pieter Hintjens provided some high-level coverage of ZeroMQ, I felt this session should have been called “Software Architecture 101″ and could apply to using any technology stack. I gained zero insight into ZeroMQ beyond what the executive summary already covered.

I was really starting to doubt my remaining session choices at this point, the first two were boring following a bad keynote. So I reached out to some friends to hear their experiences. This altered my scheduled for the rest of the day.

A Whole New World by Gary Bernhardt

This session was a short, light-hearted lunch session with a total Rick-roll ending. At least I ate my lunch, took a break, and make a couple of phone calls. I got blocked out on the Twitter Zipkin session due to space constraints, but I heard it was nothing special, so I glad didn’t miss anything.

Building an Impenetrable Zookeeper

Finally, an in-depth session given by a member of the team providing commercial support — if only I used Zookeeper. I understand what Zookeeper does, and the subject matter dealt with the type of issues organizations encounter trying to run it. I found this very interesting, particularly since I have a good understanding of distributed consensus and configuration — and this is not an easy nut to crack. I came away with some interesting notes that I’ll keep in mind when I create systems that either interact with Zookeeper, or perhaps when I create yet-another-open-source-project (Topshelf Bartender perhaps!). 

Graph: composable production systems in Clojure

What Jason Wolfe (of Prismatic, the news aggregator) offered up was a refreshing approach to building a functional container. Graph is comparable to Guice or Dagger, and provides a declarative approach to system composition. While at the lowest level it seems to offer the same features as an IOC container, the way it was presented and explained was really nice. I enjoyed this session, and took away a few notes for my own use. I also gained a greater fondness for Clojure, which was a recurring theme as the sessions continued.

The Database as a Value by Rich Hickey

So having done well with a Clojure talk, I decided to take in another one from the man himself, the author of Closure. The talk on Datomic was a nice realization that we are reaching a level where immutable databases are available and usable. Datomic is sweet, and how it handles IO and manages to spread the Live Index to multiple nodes for fast access is clever. I enjoyed this talk and look forward to seeing the ideas in Datomic shape a new wave of immutable storage systems (I’m not sure it’s a database, despite the intense conversation at the pre-party on that very subject). And again, an increasing appreciation for Clojure.

That’s how Day 1 ended for me, on a good note. So we went to Pappy’s BBQ and managed to snag one of the last remaining racks of ribs (apparently they sell out fairly early, while in line the chicken, turkey, and chopped brisket sold out). After dinner, we returned to the hotel to continue working through some code that I’d been toying with throughout the day (FeatherVane-related, if you were curious).

Day 2

Computer Like the Brain by Jeff Hawkins

This talk was almost scary. The depth of knowledge on the human brain is staggering. At one point, I saw a tweet suggesting that the Terminator himself was about to pop onto the stage and tell Hawkins to abandon his research for the sake of humanity. Yes, it was that scary. The way his company has built out models that match the human brain is impressive, and the results of some of their predictive systems were very close to reality. However, predicting the future is hard, and it’s easy to get it wrong. While many systems have promised to give us brain-like capabilities, most if not all of them have been limited in applicability or flat out failed when generalized. I suppose that is actually good for us (mankind).

Y Not? Adventures in Functional Programming

Jim Weirich is pretty well known (well, apparently I don’t know anybody — a joke that never ended during the conference) and he took the audience on a ride using Clojure to explain the Y-Combinator. When the talk started, he promised a fun ride that would likely be inapplicable to anything any of us does in our daily jobs. And he was right, it was fun! Live coding works when the presenter can do it and do it well, and this was a great session. Very enjoyable, the day was off to a great start!

Runaway compexity in Big Data and a plan to stop it.

Last year, Nathan Marz open-sourced Twitter Storm during his session at StrangeLoop 2011, and it was an impressive system (written in Clojure, big shock). The real-time analytics capabilities of Storm are slick, and it sounds like it’s only gotten better over the past year. I was hoping for great things again this year, however, what I found was a bit of a reminder of a talk in 2008. At QCon San Francisco in 2008, Greg Young gave a talk about Unleashing your Domain Model, covering how insert-only data stores, event sourcing, and real-time projection of data into views can benefit real-time applications. It seems like even today these ideas are flowing through the minds of the real-time web properties.

Eventually Consistent Data Structures by Sean Cribbs

This was an eye-opening talk about newly defined data structures that enable concurrent updates that are eventually consistent. As more distributed systems are being built, the ability to perform concurrent updates on records that resolve conflicts easily is needed. As a big fan of algorithms, I found the way these data structures were assembled very interesting — despite their very specific purpose. I had originally planned on attending Oleg Kiselyov’s talk on Guessing Lazily, but the presenter was spending too much time flipping through random snippets of code that was very hard to follow, making my ability to grasp what was being done difficult. Which is a bummer, because I saw quick segments of parser combinator code, which I rely on heavily in my parsers.

Taking Off the Blindfold

This talk was awesome, and Bret Victor had people cheering. The flow of his presentation where he shared with us his vision for a dynamic, interactive IDE had some developers just screaming for more. The high point for me was one of my favorite childhood memories — taking the entire bin of Legos and dumping it out on the floor. By getting everything out in front of you, you needn’t think about what you’re going to build in a vacuum, you can see, touch, and draw items from the random chaos laid out before you. Some of the ideas here seemed to redefine what should be expected of an IDE.

The State of JavaScript

Yes, Brendan Eich, the inventor of JavaScript, laid out the awesome coming in ECMAScript 6. Some of the proposed features are awesome (and strangely enough available in the nightly FireFox builds), while some features have me concerned. CoffeeScript has clearly influenced some features, and I sensed a subtle Microsoft influence in some of the language and keyword choices. I was glad to see byte code clearly off the table, but disappointed to see macros up for possible inclusion. Brendan is an incredible presenter, and you could hear the passion in his voice.

With that, the conference was wrapped. I had a great time, had some great conversations, and really enjoyed some of the sessions. It’s great to be able to take the time to attend, listen to, and appreciate content once in a while without worrying about my own presentation. If you can make the time next year, and the content looks good, I highly recommend StrangeLoop!