Author Archives: Chris

Implementing Routing Slip with MassTransit

This article introduces MassTransit.Courier, a new project that implements the routing slip pattern on top of MassTransit, a free, open-source, and lightweight message bus for the .NET platform.

Introduction

When sagas were originally conceived in MassTransit, they were inspired by an excerpt from Chapter 5 in the book SOA Patterns by Arnon Rotem-Gal Oz. Over the past few months, the community has argued discussed how the use of the word saga has led to confusion and how early implementations included in both NServiceBus and MassTransit do not actually align with the original paper published in 1987 by Princeton University and written by Hector Garcia-Molina and Kenneth Salem in which the term was coined.

With MassTransit Courier, the intent is to provide a mechanism for creating and executing distributed transactions with fault compensation that can be used alongside the existing MassTransit sagas for monitoring and recovery.

Background

Over the past few years building distributed systems using MassTransit, a pattern I consistently see repeated is the orchestration of multiple services into a single business transaction. Using the existing MassTransit saga support to manage the state of the transaction, the actual processing steps are created as autonomous services that are invoked by the saga using command messages. Command completion is observed using an event or response message by the saga, at which point the next processing step is invoked. When the saga has invoked the final service the business transaction is complete.

As the processing required within a business transaction changes with evolving business requirements, a new version of the saga is required that includes the newly created processing steps. Knowledge of the new services becomes part of the saga, as well as the logic to identify which services need to be invoked for each transaction. The saga becomes rich with knowledge, and with great knowledge comes great responsibility (after all, knowledge is power right?). Now, instead of only orchestrating the transaction, the saga is responsible for identifying which services to invoke based on the content of the transaction. Another concern was the level of database contention on the saga tables. With every service invocation being initiated by the saga, combined with the saga observing service events and responses, the saga tables gets very busy.

Beyond the complexity of increasing saga responsibilities, more recently the business has requested the ability to selectively route a message through a series of services based on the content of the message. In addition to being able to dynamically route messages, the business needs to allow new services to be created and added to the inventory of available services. And this should be possible without modifying a central control point that dispatches messages to each service.

Like most things in computer science, this problem has already been solved.

The Routing Slip Pattern

A routing slip specifies a sequence of processing steps for a message. As each processing step completes, the routing slip is forwarded to the next step. When all the processing steps have completed, the routing slip is complete.

A key advantage to using a routing slip is it allows the processing steps to vary for each message. Depending upon the content of the message, the routing slip creator can selectively add processing steps to the routing slip. This dynamic behavior is in contrast to a more explicit behavior defined by a state machine or sequential workflow that is statically defined (either through the use of code, a DSL, or something like Windows Workflow).

MassTransit Courier

MassTransit Courier is a framework that implements the routing slip pattern. Leveraging a durable messaging transport and the advanced saga features of MassTransit, MassTransit Courier provides a powerful set of components to simplify the use of routing slips in distributed applications. Combining the routing slip pattern with a state machine such as Automatonymous results in a reliable, recoverable, and supportable approach for coordinating and monitoring message processing across multiple services.

In addition to the basic routing slip pattern, MassTransit Courier also supports compensations which allow processing steps to store process-related data so that reversible operations can be undone, using either a traditional rollback mechanism or by applying an offsetting operation. For example, a processing step that holds a seat for a patron could release the held seat when compensated.

MassTransit Courier is free software and is covered by the same open source license as MassTransit (Apache 2.0). You can install MassTransit.Courier into your existing solution using NuGet.

Activities

In MassTransit Courier, an Activity refers to a processing step that can be added to a routing slip. To create an activity, create a class that implements the Activity interface.

public class DownloadImageActivity :
    Activity<DownloadImageArguments, DownloadImageLog>
{
}

The Activity interface is generic with two arguments. The first argument specifies the activity’s input type and the second argument specifies the activity’s log type. In the example shown above, DownloadImageArguments is the input type and DownloadImageLog is the log type. Both arguments must be interface types so that the implementations can be dynamically created.

Implementing an Activity

An activity must implement two interface methods, Execute and Compensate. The Execute method is called while the routing slip is executing activities and the Compensate method is called when a routing slip faults and needs to be compensated.

When the Execute method is called, an execution argument is passed containing the activity arguments, the routing slip TrackingNumber, and methods to mark the activity as completed or faulted. The actual routing slip message, as well as any details of the underlying infrastructure, are excluded from the execution argument to prevent coupling between the activity and the implementation. An example Execute method is shown below.

ExecutionResult Execute(Execution<DownloadImageArguments> execution)
{
    DownloadImageArguments args = execution.Arguments;
    string imageSavePath = Path.Combine(args.WorkPath, 
        execution.TrackingNumber.ToString());

    _httpClient.GetAndSave(args.ImageUri, imageSavePath);

    return execution.Completed(new DownloadImageLogImpl(imageSavePath));
}

Once activity processing is complete, the activity returns an ExecutionResult to the host. If the activity executes successfully, the activity can elect to store compensation data in an activity log which is passed to the Completed method on the execution argument. If the activity chooses not to store any compensation data, the activity log argument is not required. In addition to compensation data, the activity can add or modify variables stored in the routing slip for use by subsequent activities.

In the example above, the activity creates an instance of a private class that implements the DownloadImageLog interface and stores the log information in the object properties. The object is then passed to the Completed method for storage in the routing slip before sending the routing slip to the next activity.

When an activity fails, the Compensate method is called for previously executed activities in the routing slip that stored compensation data. If an activity does not store any compensation data, the Compensate method is never called. The compensation method for the example above is shown below.

CompensationResult Compensate(Compensation<DownloadImageLog> compensation)
{
    DownloadImageLog log = compensation.Log;
    File.Delete(log.ImageSavePath);

    return compensation.Compensated();
}

Using the activity log data, the activity compensates by removing the downloaded image from the work directory. Once the activity has compensated the previous execution, it returns a CompensationResult by calling the Compensated method. If the compensating actions could not be performed (either via logic or an exception) and the inability to compensate results in a failure state, the Failed method can be used instead, optionally specifying an Exception.

Building a Routing Slip

Developers are discouraged from directly implementing the RoutingSlip message type and should instead use a RoutingSlipBuilder to create a routing slip. The RoutingSlipBuilder encapsulates the creation of the routing slip and includes methods to add activities, activity logs, and variables to the routing slip. For example, to create a routing slip with two activities and an additional variable, a developer would write:

var builder = new RoutingSlipBuilder(NewId.NextGuid());
builder.AddActivity(“DownloadImage”, “rabbitmq://localhost/execute_downloadimage”, new
    {
        ImageUri = new Uri(“http://images.google.com/someImage.jpg”)
    });
builder.AddActivity(“FilterImage”, “rabbitmq://localhost/execute_filterimage”);
builder.AddVariable(“WorkPath”, “\\dfs\work”);

var routingSlip = builder.Build();

Each activity requires a name for display purposes and a URI specifying the execution address. The execution address is where the routing slip should be sent to execute the activity. For each activity, arguments can be specified that are stored and presented to the activity via the activity arguments interface type specify by the first argument of the Activity interface. The activities added to the routing slip are combined into an Itinerary, which is the list of activities to be executed, and stored in the routing slip.

Managing the inventory of available activities, as well as their names and execution addresses, is the responsibility of the application and is not part of the MassTransit Courier. Since activities are application specific, and the business logic to determine which activities to execute and in what order is part of the application domain, the details are left to the application developer.

Once built, the routing slip is executed, which sends it to the first activity’s execute URI. To make it easy and to ensure that source information is included, an extension method to IServiceBus is available, the usage of which is shown below.

bus.Execute(routingSlip); // pretty exciting, eh?

It should be pointed out that if the URI for the first activity is invalid or cannot be reached, an exception will be thrown by the Execute method.

Hosting Activities in MassTransit

To host an activity in a MassTransit service bus instance, the configuration namespace has been extended to include two additional subscription methods (thanks to the power of extension methods and a flexible configuration syntax, no changes to MassTransit were required). Shown below is the configuration used to host an activity.

var executeUri = new Uri(“rabbitmq://localhost/execute_example”);
var compensateUri = new Uri(“rabbitmq://localhost/compensate_example”);

IServiceBus compensateBus = ServiceBusFactory.New(x =>
    {
        x.ReceiveFrom(compensateUri);
        x.Subscribe(s => s.CompensateActivityHost<ExampleActivity, ExampleLog>(
            _ => new ExampleActivity());
    });

IServiceBus executeBus = ServiceBusFactory.New(x =>
    {
        x.ReceiveFrom(executeUri);
        x.Subscribe(s => s.ExecuteActivityHost<ExampleActivity, ExampleArguments>(
            compensateUri,
             _ => new ExampleActivity());
    });

In the above example two service bus instances are created, each with their own input queue. For execution, the routing slip is sent to the execution URI, and for compensation the routing slip is sent to the compensation URI. The actual URIs used are up to the application developer, the example merely shows the recommended approach so that the two addresses are easily distinguished. The URIs must be different!

Monitoring Routing Slips

During routing slip execution, events are published when the routing slip completes or faults. Every event message includes the TrackingNumber as well as a Timestamp (in UTC, of course) indicating when the event occurred:

  • RoutingSlipCompleted
  • RoutingSlipFaulted
  • RoutingSlipCompensationFailed

Additional events are published for each activity, including:

  • RoutingSlipActivityCompleted
  • RoutingSlipActivityFaulted
  • RoutingSlipActivityCompensated
  • RoutingSlipActivityCompensationFailed

By observing these events, an application can monitor and track the state of a routing slip. To maintain the current state, an Automatonymous state machine could be created. To maintain history, events could be stored in a database and then queried using the TrackingNumber of the RoutingSlip.

Wrapping Up

MassTransit Courier is a great way to compose dynamic processing steps into a routing slip that can be executed, monitored, and compensated in the event of a fault. When used in combination with the existing saga features of MassTransit, it is possible to coordinate a distributed set of services into a reliable and supportable system.

IDisposable, Done Right

IDisposable is a standard interface in the .NET framework that facilitates the deterministic release of unmanaged resources. Since the Command Language Runtime (CLR) uses Garbage Collection (GC) to manage the lifecycle of objects created on the heap, it is not possible to control the release and recovery of heap objects. While there are methods to force the GC to collect unreferenced objects, it is not guaranteed to clear all objects, and it is highly inefficient for an application to force garbage collection as part of the service control flow.

Implementing IDisposable

Despite IDisposable having only a single method named Dispose to implement, it is commonly implemented incorrectly. After reading this blog post it should be clear how and when to implement IDisposable, as well as how to ensure that resources are properly disposed when bad things happen (also knows as exceptions).

First, the IDisposable interface definition:

public interface IDisposable
{
    void Dispose();
}

Next, the proper way to implement IDisposable every single time it is implemented:

public class DisposableClass :
    IDisposable
{
    bool _disposed;

    public void Dispose()
    {
        Dispose(true);
        GC.SuppressFinalize(this);
    }

    ~DisposableClass()
    {
        Dispose(false);
    }

    protected virtual void Dispose(bool disposing)
    {
        if (_disposed)
            return;

        if (disposing)
        {
            // free other managed objects that implement
            // IDisposable only
        }

        // release any unmanaged objects
        // set thick object references to null

        _disposed = true;
    }
}

The pattern above for implementing IDisposable ensures that all references are properly disposed and released. Using the finalizer, along with the associated dispose methods, will ensure that in every case references will be properly released. There are some subtle things going on in the code, however, as described below.

Dispose()

The implementation of the Dispose method calls the Dispose(bool disposing) method, passing true, which indicates that the object is being disposed. This method is never automatically called by the CLR, it is only called explicitly by the owner of the object (which in some cases may be another framework, such as ASP.NET or MassTransit, or an object container, such as Autofac or StructureMap).

~DisposableClass

Immediately before the GC releases an object instance, it calls the object’s finalizer. Since an object’s finalizer is only called by the GC, and the GC only calls an objects finalizer when there are no other references to the object, it is clear that the Dispose method will never be called on the object. In this case, the object should release any managed or unmanaged references, allowing the GC to release those objects as well. Since the same object references are being released as those that are released when Dispose is called, this method calls the Dispose(bool disposing) method passing false, indicating that the references objects Dispose method should not be called.

Dispose(bool)

All object references and unmanaged resources are released in this method. However, the argument indicates whether or not the Dispose method should be called on any managed object references. If the argument is false, the references to managed objects that implement IDisposable should be set to null, however, the Dispose method on those objects should not be called. The reason being that the owning objects Dispose method was not called (Dispose(false) is only called by the finalizer, and not the Dispose method.

Overriding Dispose

In the example above, the Dispose(bool disposing) method is declared as protected virtual. This is to allow classes that inherit from this class to participate in the disposable of the object without impacting the behavior of the base class. In this case, a subclass should override the method as shown below.

public class SubDisposableClass : 
    DisposableClass
{
    private bool _disposed;

    // a finalizer is not necessary, as it is inherited from
    // the base class

    protected override void Dispose(bool disposing)
    {
        if (!_disposed)
        {
            if (disposing)
            {
                // free other managed objects that implement
                // IDisposable only
            }

            // release any unmanaged objects
            // set thick object references to null

            _disposed = true;
        }

        base.Dispose(disposing);
    }
}

The subclass overrides the method, releasing (and optionally disposing) object references first, and then calling the base method. This ensures that objects are released in the proper order (at least between the subclass and the base class, the proper order of releasing/disposing objects within the subclass itself is the responsibility of the developer).

Exceptions, Happen

Prior to .NET 2.0, if an object’s finalizer threw an exception, that exception was swallowed by the runtime. Since .NET 2.0, however, throwing an exception from a finalizer will cause the application to crash, and that’s bad. Therefore, it is important that a finalizer never throw an exception.

But what about the Dispose method, should it be allowed to throw an exception? The short answer, is no. Except when the answer is yes, which is almost never. Therefore, it is important to wrap any areas of the Dispose(bool disposing) method that could throw an exception in a try/catch block as shown below.

protected virtual void Dispose(bool disposing)
{
    if (_disposed)
        return;

    if (disposing)
    {
        _session.Dispose();
    }

    try
    {
        _channelFactory.Close();
    }
    catch (Exception ex)
    {
        _log.Warn(ex);

        try
        {
            _channelFactory.Abort();
        }
        catch (Exception cex)
        {
            _log.Warn(cex);
        }
    }

    _session = null;
    _channelFactory = null;

    _disposed = true;
}

In the example, session is a reference to an NHibernate ISession and channelFactory is a reference to a WCF IChannelFactory. An NHibernate ISession implements IDisposable, so the owner must call Dispose on it when the object is no longer needed. In the case of the IChannelFactory reference, there is no Dispose method, however, the object must be closed (and subsequently aborted in case of an exception). Because either of these methods can throw an exception, it is important to catch the exception (and, as shown above, log it for troubleshooting or perhaps just ignore it) so that it doesn’t cause either the Dispose method or the object’s finalizer to propagate the exception.

Constructor Exceptions

On a related topic, when an object’s constructor throws an exception, the runtime considers the object to have never existed. And while the GC will release any object allocated by the constructor, it will not call the Dispose method on any disposable objects. Therefore, if an object is creating references to managed objects in the constructor (or even more importantly, unmanaged objects that consume limited system resources, such as file handles, socket handles, or threads), it should be sure to dispose of those resources in the case of a constructor exception by using a try/catch block.

While one might be tempted to call _Dispose_ from the constructor to handle an exception, don’t do it. When the constructor throws an exception, technically the object does not exist. Calling methods, particularly virtual methods, should be avoided.

Of course, in the case of managed objects such as an ISession, it is better to take the object as a dependency on the constructor and have it passed into the object by an object factory (such as a dependency injection container, such as Autofac) and let the object factory manage the lifecycle of the dependency.

Container Lifecycle Management

Dependency injection containers are powerful tools, handling object creation and lifecycle management on behalf of the developer. However, it is important to have a clear understanding of how to use the container in the context of an application framework.

For example, ASP.NET has a request lifecycle for every HTTP request received by the server. To support this lifecycle, containers typically have integration libraries that hook into the framework to ensure proper object disposal. For instance, Autofac has a number of integration libraries for ASP.NET, ASP.NET MVC, ASP.NET Web API, and various other application frameworks. These libraries, when configured into the stack as HttpModules, ensure that objects are properly disposed when each request completes.

Conclusion

The reason for IDisposable is deterministic release of references by an object (something that used to happen manually with unmanaged languages by calling delete on an object). Implementing it both properly and consistently helps create applications that have predictable resource usage and more easy to troubleshoot. Therefore, consider the example above as a reference point for how objects should be disposed.

References:
- Autofac Web Integration
- Microsoft Documentation

Bonus:
- Resharper Template

Separating Concerns – Part 1: Libraries

Introduction

In large applications, particularly in enterprise applications, separation of concerns is critical to ease maintainability. Without proper separation of concerns, applications become too large and too complex, which in turn makes maintenance and enhancement extremely difficult. Separating application concerns leads to high cohesion, allowing developers to better understand code behavior which leads to easier code maintenance.

History

In the previous decade, architects designed applications using an n-tier approach, separating the application into horizontal layers such as user interface, business logic, and data access. This approach is incomplete, however, as it fails to address partitioning applications vertically. Unrelated concerns are commingled, resulting in a confusing architecture which lacks clearly defined boundaries and has low cohesion.

The other problem with an n-tier architecture is how it is organized from top to bottom, with the topmost layer being the presentation layer or user interface, and the bottommost layer representing the persistence layer or database. Instead of thinking of the architecture as horizontal layers, think of them as rings, as described by the Onion Architecture described by Jeffrey Palermo. (While Jeffrey proposed the pattern name, the architectural patterns have been defined previously by others.)

Separating Concerns

Given that a separation of concerns and increasing cohesion are the goals, there are several mechanisms towards achieving them. The solutions that follow include the use of libraries, services, and frameworks as ways to reach these goals.

The Library

A library is a set of functions used to build software applications. Rather than requiring an application to be a single project containing every source file, most programming languages provide a means to segregate functionality into libraries. While the facility name varies, a partial list of which includes package, module, gem, jar, and assembly, the result is enabling developers to separate functions physically from the main application project, improving both cohesion and maintainability.

Core, the new Manager

A library should not be a collection of unrelated functions, it should contain related functions so that it is highly cohesive. An application developer should be able to select a library for use based on its name and purpose, rather than having to pour through the source code to find the function or functions needed. A library should have a descriptive name and contain a cohesive set of functions towards a singular purpose or responsibility.

Creating a library named Core containing a large set of unrelated functions is separation of the sake of separation, and that library should not be treated as a library but as part of the application — it should not be reused by other applications.

Coupling (aka, the Path of Pain)

When an industry analyst shares their observations about code reuse in the enterprise, the findings indicate that actual code reuse is very low. A main reason that code reuse is so low is tight coupling. Coupling refers to how two libraries (or functions) rely on each other. When a library relies upon another library, the library relied on is referred to as a dependency. When an application relies on a library, it implicitly relies on the library’s dependencies as well. In many larger applications, this can lead straight to dependency hell.

Since tight coupling can lead to serious maintenance issues during an application’s lifecycle, limiting dependencies should be first and foremost in application and library design. If a function is to be moved from an application to a library, and that function must bring with it a dependency that was not previously required by the target library, the cost of adding the new dependency to the library must be considered. Too often, particularly in the enterprise where code is only reviewed internally by a single development team, poor choices are made when creating libraries. Functions are routinely moved out of the main project and placed into arbitrary libraries with little thought given to the additional dependencies of the library.

An Example

As an example, a web application has a set of functions for validating email addresses. The simplest validation methods may only depend upon regular expression functions, which are part of every modern language runtime used today. A more complete validation of an email address may check that the domain is actually valid and has a properly registered MX record in DNS. However, validating the domain involves sending a request to a service and waiting for the response indicating a valid domain before the email address is determined to be valid.

There are many things wrong in this example. First, the email validation function has a dependency on a domain validation function. Due to the fact that the set of valid domains is continuously changing, the domain validation function itself has a dependency on a domain name service. Of course, the domain name service depends upon a network domain name service, which may subsequently depend upon an internet service as well. By calling one library function, the application has managed to send a request to another machine and block a thread waiting for a response.

In the case of an error, the disposition of the email address is then unknown. Is it a valid email address that could not be validated due to a network error? Or is it a valid email address but flagged as invalid because the domain name could not be validated due to an internal DNS server not allowing external domains to be returned?

The coupling in the email validation library is clearly a problem, but what happens as the business requirements evolve over the life of the application? Consider the situation where new accounts are being created by spammers from other countries. To combat the spam accounts, email addresses must now be validated to ensure that the IP address originates from within the United States. The email validation function now has a new dependency, a geolocation service that returns the physical address of a domain. However, the service requires the use of separate endpoints for testing and production. The email address validation function is now dependent upon two services and configuration data to determine which service endpoint to use.

At this point, it is obvious that the complexity of validating an email address is not something that can be accomplished in a library function.

This article will continue with Part 2 on services.

Tulsa TechFest 2012 Code

Here is the code from my talk at Tulsa TechFest on SignalR. Thanks to those of you who came to the talk, I hope you learned enough about SignalR to determine if it’s the right technology for you. Be sure you have enabled NuGet to restore packages on build so the required references all get downloaded and installed from the NuGet site.

TulsaTechFest2012.zip

 

StrangeLoop 2012

This past weekend I attended my 2nd StrangeLoop conference. StrangeLoop is an annual conference held in St. Louis, MO and for the last four years it has managed to draw some impressive talent. Unlike other events I attend, StrangeLoop is an independent conference and is not dominated by a single platform, technology, or language. The quality and level of content is also high, making StrangeLoop a place where introductory sessions are frowned upon — attendees want deep, intriguing sessions where experienced practitioners can learn new things. Attendees at StrangeLoop are commonly pushing the leading edge, and the session topics are state of the art, sometimes on the edge of redefining software development in the coming years.

So how was it?

Day 1

Opening Keynote: VoltDB, Michael Stonebraker

In the first thirty minutes, I had a strong sense that the conference was off to a rough start. In what was clearly a product-focused talk, the VoltDB CTO made a weak case for ACID, eliciting frequent groans from the audience. Make no mistake, Stonebraker is a really smart guy, but too much of his time was spent bashing other databases (if you can technically call eventually consistent storage systems without a query language databases). As an opening keynote for the conference, this was the worst possible choice. Now, I have followed VoltDB since the early bits, and was impressed with the lock-free approach that serializes all operations, but this talk didn’t spend enough time on the benefits of VoltDB.

Get a Leg Up with Twitter Bootstrap

For the first actual session of the day, Howard Lewis Ship took the audience on a tour of Twitter Bootstrap, which is rapidly becoming the File, New Web Site project template. In fact, I was glad to see that entire gallery of customized Bootstrap templates — hopefully now all Bootstrap originated sites won’t all look the same. I’m a fan of Bootstrap, and this was a solid introduction, but myself (and the rest of the audience I’m sure) was hoping for a bit more depth.

Software Architecture using ZeroMQ

My expectations were high on this session, and I was really hoping to get some insight into 0MQ, and how to build systems using it as the authors intended. While Pieter Hintjens provided some high-level coverage of ZeroMQ, I felt this session should have been called “Software Architecture 101″ and could apply to using any technology stack. I gained zero insight into ZeroMQ beyond what the executive summary already covered.

I was really starting to doubt my remaining session choices at this point, the first two were boring following a bad keynote. So I reached out to some friends to hear their experiences. This altered my scheduled for the rest of the day.

A Whole New World by Gary Bernhardt

This session was a short, light-hearted lunch session with a total Rick-roll ending. At least I ate my lunch, took a break, and make a couple of phone calls. I got blocked out on the Twitter Zipkin session due to space constraints, but I heard it was nothing special, so I glad didn’t miss anything.

Building an Impenetrable Zookeeper

Finally, an in-depth session given by a member of the team providing commercial support — if only I used Zookeeper. I understand what Zookeeper does, and the subject matter dealt with the type of issues organizations encounter trying to run it. I found this very interesting, particularly since I have a good understanding of distributed consensus and configuration — and this is not an easy nut to crack. I came away with some interesting notes that I’ll keep in mind when I create systems that either interact with Zookeeper, or perhaps when I create yet-another-open-source-project (Topshelf Bartender perhaps!). 

Graph: composable production systems in Clojure

What Jason Wolfe (of Prismatic, the news aggregator) offered up was a refreshing approach to building a functional container. Graph is comparable to Guice or Dagger, and provides a declarative approach to system composition. While at the lowest level it seems to offer the same features as an IOC container, the way it was presented and explained was really nice. I enjoyed this session, and took away a few notes for my own use. I also gained a greater fondness for Clojure, which was a recurring theme as the sessions continued.

The Database as a Value by Rich Hickey

So having done well with a Clojure talk, I decided to take in another one from the man himself, the author of Closure. The talk on Datomic was a nice realization that we are reaching a level where immutable databases are available and usable. Datomic is sweet, and how it handles IO and manages to spread the Live Index to multiple nodes for fast access is clever. I enjoyed this talk and look forward to seeing the ideas in Datomic shape a new wave of immutable storage systems (I’m not sure it’s a database, despite the intense conversation at the pre-party on that very subject). And again, an increasing appreciation for Clojure.

That’s how Day 1 ended for me, on a good note. So we went to Pappy’s BBQ and managed to snag one of the last remaining racks of ribs (apparently they sell out fairly early, while in line the chicken, turkey, and chopped brisket sold out). After dinner, we returned to the hotel to continue working through some code that I’d been toying with throughout the day (FeatherVane-related, if you were curious).

Day 2

Computer Like the Brain by Jeff Hawkins

This talk was almost scary. The depth of knowledge on the human brain is staggering. At one point, I saw a tweet suggesting that the Terminator himself was about to pop onto the stage and tell Hawkins to abandon his research for the sake of humanity. Yes, it was that scary. The way his company has built out models that match the human brain is impressive, and the results of some of their predictive systems were very close to reality. However, predicting the future is hard, and it’s easy to get it wrong. While many systems have promised to give us brain-like capabilities, most if not all of them have been limited in applicability or flat out failed when generalized. I suppose that is actually good for us (mankind).

Y Not? Adventures in Functional Programming

Jim Weirich is pretty well known (well, apparently I don’t know anybody — a joke that never ended during the conference) and he took the audience on a ride using Clojure to explain the Y-Combinator. When the talk started, he promised a fun ride that would likely be inapplicable to anything any of us does in our daily jobs. And he was right, it was fun! Live coding works when the presenter can do it and do it well, and this was a great session. Very enjoyable, the day was off to a great start!

Runaway compexity in Big Data and a plan to stop it.

Last year, Nathan Marz open-sourced Twitter Storm during his session at StrangeLoop 2011, and it was an impressive system (written in Clojure, big shock). The real-time analytics capabilities of Storm are slick, and it sounds like it’s only gotten better over the past year. I was hoping for great things again this year, however, what I found was a bit of a reminder of a talk in 2008. At QCon San Francisco in 2008, Greg Young gave a talk about Unleashing your Domain Model, covering how insert-only data stores, event sourcing, and real-time projection of data into views can benefit real-time applications. It seems like even today these ideas are flowing through the minds of the real-time web properties.

Eventually Consistent Data Structures by Sean Cribbs

This was an eye-opening talk about newly defined data structures that enable concurrent updates that are eventually consistent. As more distributed systems are being built, the ability to perform concurrent updates on records that resolve conflicts easily is needed. As a big fan of algorithms, I found the way these data structures were assembled very interesting — despite their very specific purpose. I had originally planned on attending Oleg Kiselyov’s talk on Guessing Lazily, but the presenter was spending too much time flipping through random snippets of code that was very hard to follow, making my ability to grasp what was being done difficult. Which is a bummer, because I saw quick segments of parser combinator code, which I rely on heavily in my parsers.

Taking Off the Blindfold

This talk was awesome, and Bret Victor had people cheering. The flow of his presentation where he shared with us his vision for a dynamic, interactive IDE had some developers just screaming for more. The high point for me was one of my favorite childhood memories — taking the entire bin of Legos and dumping it out on the floor. By getting everything out in front of you, you needn’t think about what you’re going to build in a vacuum, you can see, touch, and draw items from the random chaos laid out before you. Some of the ideas here seemed to redefine what should be expected of an IDE.

The State of JavaScript

Yes, Brendan Eich, the inventor of JavaScript, laid out the awesome coming in ECMAScript 6. Some of the proposed features are awesome (and strangely enough available in the nightly FireFox builds), while some features have me concerned. CoffeeScript has clearly influenced some features, and I sensed a subtle Microsoft influence in some of the language and keyword choices. I was glad to see byte code clearly off the table, but disappointed to see macros up for possible inclusion. Brendan is an incredible presenter, and you could hear the passion in his voice.

With that, the conference was wrapped. I had a great time, had some great conversations, and really enjoyed some of the sessions. It’s great to be able to take the time to attend, listen to, and appreciate content once in a while without worrying about my own presentation. If you can make the time next year, and the content looks good, I highly recommend StrangeLoop!

 

 

Rebooting Topshelf for Version 3

When we created Topshelf, one of the prime directives was ease of use. It had to be easy for the developer to add a reference and create a service. To keep it easy, we had another prime directive: the developer should only be required to reference a single assembly to get Topshelf to work. And that assembly should have no dependencies.

Why?

Before NuGet, using open source was difficult for .NET developers. With so many different versions of assemblies and no single point of distribution, it was a continuous effort to get a solution with multiple open source dependencies to build properly. Fast forward to today, the NuGet world, and developers can simply add a reference using the NuGet package manager and all the dependencies come along for the ride. The migration of the community towards NuGet has made the directive of one assembly significantly less important.

This evolution of the open source community requires authors to re-imagine their products to fit properly in this new world. In order to keep Topshelf the best and easiest way to create Windows services, we are planning to do just that — re-imagine the model for Topshelf going forward.

With the release of Topshelf 3.0, the main NuGet package will contain only the functionality necessary to create, install, and control your own service. By focusing on this single goal, the highest level of safety and stability can be attained. This allows allows us to keep the footprint of Topshelf as small as possible, reducing the surface area around your mission-critical services that are running on it.

Once we have the main Topshelf assembly stable and production tested, we will revisit the other features of Topshelf and look at how it they fit into the new direction. Some features may be discarded while others may be changed to be more operationally sustainable. These features, however, will not be included in the main package. Instead, they will sit on top of the stable and proved Topshelf assembly, ensuring that the core functionality remains solid.

When?

I’ll be posting a prerelease version of 3.0 on the main NuGet feed in the next few days. This version will continue to support both .NET 3.5 and .NET 4.0, as well as .NET 4.5 once it is generally available (the 4.x version should work with 4.5 until then). The previous v2.x code branches will be renamed from (develop/master) for retention (v2_develop/v2_master).

Migration from previous versions should be fairly painless as the API is nearly identical. There are a few minor tweaks and some additional options for using the new features (such as the ability to control the host from the service, including the ability to stop the service — a very requested feature), most of the settings such as service name and such are now entirely optional, with the default using the namespace of the hosting assembly for the service name.

That’s the current roadmap for Topshelf. Hopefully you’ll agree that this reboot makes sense, as the current codebase has completely outgrown what is needed to host a simple service. Using this additive approach should make it easier to build features on top of the solid core Topshelf service, without comprising the integrity of the base service host functionality.

Benchmarque – Comparative Benchmarking for .NET

Last night, I announced that the first release of my benchmarking library Benchmarque was available on NuGet. This morning, I’d like to share with you what the library is, and how it to use it.

What is Benchmarque?

Benchmarque (pronounced bench-mar-key) allows you to create comparative benchmarks using .NET. An example of a comparative benchmark would be evaluating two or more approaches to performing an operation, such as whether for(), foreach(), or LINQ is faster at enumerating an array of items. While this example often falls into the over-optimization category, there are many related algorithms that may warrant comparison when cycles matter.

How do I use it?

To understand how to use Benchmarque, let’s work through an example. First, start Visual Studio 2010 Service Pack 1 with NuGet 2.0 installed and create a new class library project using the .NET 4.0 runtime. Once created, we’re going to define an interface for our benchmark.

In this benchmark, we are going to compare the performance of the different ways to append text into a single string. Now that we have the interface defining the behavior we want to benchmark, we need to create a few implementations that perform the operation.

First, the good old concatenation operator.

Next, we’ll use a StringBuilder to handle the work.

And last, we’ll try to use string.Join with an empty separator.

With our three implementations ready to benchmark, we now need to create an actual benchmark. We’ll take a list of names, and call the interface with those names. Before we can do that, however, it’s time to add Benchmarque to the project. Using the NuGet package manager, install Benchmarque to your class library project.

Installing from Package Manager

Once installed, we can create our benchmark class as shown below.

A benchmark includes three methods that involve the execution of the benchmark, along with a property that returns the iteration counts for each run. WarmUp is called with the implementation to allow any one-time initialization of the implementation to be established. This allow should include a few runs through the test to allow the runtime to JIT any code to ensure the benchmark only includes actual execution time (versus assembly load and JIT time). The Run method is then called with each of the iteration counts to actually run the benchmark. Once complete, the Shutdown method is called to dispose of any resources used by the implementation.

The benchmark runner (Benchmarque.Console, which is installed in the tools folder by NuGet) will run the benchmark with each implementation and measure the time taken. To run the benchmark, we need to open the NuGet Package Manager Console, change to the assembly folder, and start the benchmark using Start-Benchmark as shown below.

Open the package manager console

Once open, change to the folder for the assembly to benchmark.

Change to the output folder

And now, we’re going to run the actual benchmark and view the results.

Results of benchmark

First, Start-Benchmark is a Powershell function that is added by the init.ps1 that’s included in the NuGet package. It handles the execution of the benchmark using the console runner. Once complete, the output of the benchmark is displayed in the console window.

As shown above, the results of the test execution are ordered with the fastest implementation first, followed by the remaining implementations with the difference and how many times slower it is displayed. The output is pretty basic at this point, without a lot of other calculations displayed. Additional items may be added as some point. For now, it’s enough to give me the answers I need when trying different approaches to the same problem.

The library is open source (I’ll put the Apache 2 documents in place at some point), so feel free to use, abuse, modify, and enhance as needed! 

One request: If anyone is a Powershell megastar and can modify the Benchmarque.psm1 so that if no argument is specified, it looks through the solution for the projects that are referencing Benchmarque, and automatically running the benchmarks in those assemblies so they don’t have to be specified explicitly.