Nov 26

The actor programming model is a software development method that encourages the decomposition of applications into autonomous components which are self-contained and operate asynchronously and independently from one another. This model is well aligned with the nondeterministic nature of distributed systems, including mobile systems, interactive systems, and the internet.

As I mentioned previously, I didn’t invent it. I’m merely leveraging the information obtained from a number of sources and applying it in a way that I think makes it easier to build certain types of applications. Applications that can benefit from a highly concurrent actor-based programming model include reactive systems — ones that respond to nondeterministic external events. Since many applications can be described as “a program that responds to external events” it only makes sense that the actor programming model can be applied to many domains.

Here are some papers that I’ve read on the actor model, some of which have influenced me in how I think about concurrent programming and others that have merely provided background information or depicted ways in which concurrent programming should not be approached.

Actors, Rajesh K. Karmani, Gul Agha

Actors: A Model of Concurrent Computation In Distributed Systems, Gul A. Agha (out of print)

Actor Languages for Specification of Parallel Computations, Gul Agha, Wooyoung Kim, Rajendra Panwar

An Actor-Based Framework for Heterogeneous Computing Systems, Gul Agha, Rajendra Panwar

Actors that Unify Threads and Events, Philipp Haller, Martin Odersky

Lightweight Language Support for Type-Based, Concurrent Event Processing, Philipp Haller

Compilation of a Highly Parallel Actor-Based Language, WooYoung Kim, Gul Agha

These are some of the more involved works from which I’ve found many useful bits of information. I’ve got them permanently stored in GoodReader so I can keep looking back to them (and my associated annotations as well). Hopefully anyone looking to build systems using the actor model (and hopefully, using Stact if you’re on the .NET platform) can get a better understanding of the model by reviewing these papers.

 

Nov 16

At the end of October, we released MassTransit v2.0.1 to GitHub and NuGet. This release only included a few fixes that didn’t make it into the v2.0 release. Since I never made an official announcement of v2.0 on the blog, some links to the project, documentation, and mailing list are included below.

For those using the 1.x lineage of MassTransit, v2.0 includes several breaking changes in the API. This was necessary to reduce the complexity of getting new users up-to-speed, as well as eliminating some common areas of confusion. The API for v2.x should remain consistent from this point forward (well, until we start working on v3.x, which is a long ways off honestly).

Project Site (hosted on GitHub)
NuGet Project
Documentation
Mailing List
Ohloh Metrics

It’s worth noting that the MassTransit organization on GitHub is the ‘official’ repository. Please file any issues on that repository so that all of the MassTransit team members can help with any issues. However, you are encouraged to check the mailing list first as many first-time issues are discussed there.

This release was a long road and involved a lot of internal code cleanup, API grooming, and support for a new transport (RabbitMQ). We welcome your feedback, questions, and suggestions.

Enjoy!

 

Nov 15

Last week, I had the pleasure of attending Øredev in Malmö, Sweden. While at the conference, I presented two sessions — including a new talk on Actor Model Programming in C#. This was the first official presentation I’ve given on the subject, having done an ad-hoc version of the session at Pablo’s Fiesta this year (which went fairly well, likely due to the awesome Chicken and Waffles at 24 Diner the night before). Early feedback from the Øredev session was positive, which is encouraging since I will be giving an updated version of the talk at CodeMash 2.0.1.2 in January.

First, I wanted to share a few links to the content discussed in the session, including the GitHub Project, the NuGet package, and the TeamCity build. I will update the post with the video link once the presentation video is available, along with the slide deck.

Second, I plan to post a series of blog posts explaining how actor model programming is a great model for building concurrent applications, despite the difficulties that the actor model has had in becoming more mainstream (some of those difficulties are explaining in this article by Paul Mackay).

In the meantime, I’m going to take a hard look at how different languages have implemented the actor model (many of which have influenced the current syntax used in Stact). I’m also taking a step back and identifying other ways the model can be implemented the minimize many of the difficulties and bring some modern programming style to the model. Concurrency is certainly difficult, but I’m convinced that many aspects can be made more approachable by applying some existing idioms to the problem.

If you do take a look at Stact, please offer any feedback you have via Twitter (I’m @PhatBoyG) or GitHub (using issues, whatever). If the traffic grows, we’ll setup a Google group to keep things manageable.

Until next time…

 

May 03

After what seems like a long slumber, along with work being done on other projects such as Topshelf and Stact, it is our great pleasure to announce the first beta release of MassTransit v2.0. What originally started out as a minor “1.3” update has turned into a full-out cleanup of the codebase, including a refinement of the configuration API. Since there were some breaking changes to the configuration, we felt a 2.0 moniker was better to ensure users of the framework understood the depth of the changes made.

And what a list of changes it is (TL;DR = We filled it with awesomeness):

  1. Configuration
    MassTransit v2.0 now includes a streamlined configuration model built around an extensible fluent interface (inspired by Stact and Topshelf and sharing a common, consistent design). As a result, getting started with MassTransit is now easier than ever. In version 2.0, all configuration starts with the ServiceBusFactory and Intellisense guides you from there forward. The result is a clean, understandable API and a quicker out-of-the-box experience.

  2. Container-Free Support
    With the release of MassTransit 2.0, using a dependency injection container is now optional. When we started MassTransit, we leveraged the container extensively to assemble the internal workings of the bus. As we added support for other containers, required features that were not supported by a particular container led to some creative solutions (read: hacks) that were less than optimal. By moving away from a “container-first” approach, we have increased the reliability of the software and now provide container-specific extensions to subscribe consumers from the container in one simple step. We also threw in support for Autofac!

  3. Quick-Start
    By simplifying the configuration, and dropping the need for a container, it is now fast and easy to get started using our new QuickStart:
    http://docs.masstransit-project.com/en/latest/configuration/quickstart.html

  4. #NuGet
    NuGet packages have been added for the base MassTransit project, with any external dependencies (log4net and Magnum) resolved using the proper NuGet packages. Any additional references are downstream in additional NuGet packages, such as support for persisting sagas using NHibernate (MassTransit.NHibernate), and the various dependency injection containers supported.

  5. Multiple Subscription Service Options
    In addition to the existing RuntimeServices included with MassTransit, an all-new peer-to-peer subscription service has been added. By leveraging the reliable multi-cast support in MSMQ, services can now exchange subscription information without the need for a centralized subscription service. To ensure everything is setup correctly, a VerifyMsmqConfiguration method has been added that will check the installation of MSMQ and install any missing components. This is the first iteration of multi-cast support, and we need to get some mileage on it. In the meantime, the original run-time services continue to work as expected.

  6. Documentation
    Which brings us to the next big update. DOCS! They’re not perfect, and they’re far from complete, but we have focused on the configuration story to help get you up and running. As we see a need for more documentation in a given area, we will continue to flush out the docs appropriately. The docs are located at http://docs.masstransit-project.com/ and are being hosted by the fine people at http://readthedocs.org. [Thanks Eric!]

  7. Support for .NET 4.0 and .NET 3.5
    The project files and solution have all been updated to Visual Studio 2010 SP1. By default, all projects are now built in the IDE targeting .NET 4.0. The command-line build (which has been revamped to use Rake and Albacore) builds both .NET 3.5 and .NET 4.0 assemblies, including the run-time services and System View. The NuGet packages also include the proper bindings for the target project run-time version (you must use the full .NET 4.0 profile with MassTransit, the client profile is not supported).

  8. Transport Support
    Internally, the transports and endpoints have been redesigned to improve the support for new transports like RabbitMQ (and improve our ActiveMQ support). For example, transports are now inbound, outbound, or both, allowing us to properly leverage fan-out exchanges on RabbitMQ for publishing and subscribing to messages. There is more to come in this area as we take greater advantage of these advanced transport features. If you’re a RabbitMQ or ActiveMQ user and don’t mind getting your hands dirty, now is a great time to jump in and help improve transport support.

  9. Distributor Consumer And Saga Support
    Work on the MassTransit distributor subsystem continues to be improved. Testing on a multi-master system has been completed which will allow it to serve multiple distributors to improve load balancing efficiency. Support for all sagas (previously only state machine sagas were supported) has been added as well.

  10. Swinging the Feature Axe
    Some previous troublesome and poorly supported features (Batching and Message Grouping) were removed from the 2.0 release to reduce code complexity. Also in light of the new Parallel Tasks work in the framework the Parallel namespace has been removed.

In the next few days, I’ll be posting an annotated walkthrough of the new configuration API. In the meantime, fire up Visual Studio 2010, create ConsoleApplication69, switch to the full .NET 4.0 framework, and Add a Library Package Reference to MassTransit using NuGet. Paste the code from the Quick Start into your program.cs and check it out!

Mar 24

Over the past few months I have been reviewing many of the products I was involved in creating, both as a developer and an architect, and have assembled an inventory of the technology and architecture used. With a catalog of products spanning more than eighteen years, a diverse set of architectural styles are represented. On one end of the spectrum are client/server systems deployed on-premise and on the opposite end are software-as-a-service (SAAS) browser-based products. Most of these products are line-of-business systems and include both heavy user interaction and background data processing. In fact, two separate products offer a similar feature set targeted at the same market but sit on opposite ends of the architectural spectrum. The first product was built in the 90′s and is a client/server system, the latter was built more recently during the SAAS era targeting the web.

What follows are a few of the common design choices that I encountered, with my take on how appropriate that same choice would be today.

Data Storage

As I looked into each product, I examined how various requirements were addressed given the tools available at the time. For example, I didn’t question the use of a flat file to store reference data in the early client/server products since flat files were perfectly acceptable at the time. However, this led me to question some design choices when looking at SAAS products — including some choices that you might not expect. For instance, why is a flat file not an acceptable design choice for a system developed today? The data is still the same reference data, yet current guidance would suggest this reference data be stored in a database, most likely a relational database.

Is this because developers have become too lazy to write the component to read the file? Surely not, since a component will have to be written to import the reference data into the database. While it can be done fairly easily using database tools, the process still has to be scripted out and repeatable in case the import needs to be repeated on a new database.

Let’s enhance the problem and add a time dimension to the reference data, making updates available every thirty days. Now, not only is an initial import needed, but the import component will also need to support updating the database with the new content. Again, this could be done using database tools — a simple truncate table and repeat the import process. But what if developers have created relationships between the reference data table and other tables in the system? What if those relationships were created using the row id instead of the appropriate business identifier? At that point, the table cannot be simply truncated and the update process must now perform a complete delta of the existing and updated data sets and merge the changes into the database. That certainly doesn’t sound lazy — if anything, it sounds downright painful.

Another question that came to mind when using a relational database to store reference data was “which database?” Now, if the first answer that popped into your head when you read that was “SQL Server,” or even worse “the database,” therein lies the real problem.

A product is not just an application, it is a system composed of one or more applications, multiple components, multiple services, and multiple databases. Consider the earlier example that used a flat file to store reference data. The flat file itself is a separate database. In a system of any complexity there are many different sets of reference data, all of which are stored in their own separate flat files. Therefore, the system has multiple databases, each using the appropriate technology based on how that database is used.

If the reference data had remained in a flat file, when the flat file was updated with the new reference data, the original file is simply replaced and the system continues. No special import or update process is required.

Nested Object Graphs

Another common design I saw, particularly in products that manage a revolving set of accounts, was the use of a deeply nested object graph that is persisted in a relational database. As accounts were accessed, the entire object graph would be loaded from the database and presented to the user. Once the user made whatever changes were necessary at the time, the account was then saved to the database. In order to save the object graph, the nodes at each level in the graph are compared with the database, and deltas are generated to update the database tables.

In early examples of this design, a pessimistic locking system was implemented to track user activity and prevent multiple users from working on the same account at the same time. This was common in the client/server products, since even at that time record locking using ISAM files (or even network file locking) was fairly problematic.

As products moved to the web, a more optimistic locking strategy was used. I found two different conflict resolution methods, the first of which used a timestamp to track modifications to an account. If an update was received and the timestamp didn’t match, the later update was rejected. The second method was “last write wins,” updating the account to whatever was in the later update — possibly and quite commonly losing previous updates from other users. This got real interesting when two updates were performed at the same time.

Neither of these solutions make sense today for SAAS applications. In an environment where multiple users may be interacting with an account at the same time, it’s more important to look at providing users with a task-based user interface that captures the intent of each action on an account. For example, loading an entire account just to change the billing address creates unnecessary data movement that can limit throughput (read: scalability concern). At the same time, preventing a user from adding a charge to an account because another user slipped in behind you to update the phone number creates an unnecessary user burden. If updating the billing address, updating the phone number, and adding a charge to an account were explicit actions (read: commands) that can be performed on an account, they could all be performed simultaneously without conflict.

Note that the Command-Query Responsibility Segregation (CQRS) or even just Command-Query Separation (CQS) architectural styles specifically addresses this type of design.

Stored Procedures

In the example above, a deeply nested object graph was loaded from the database. In a system designed today, a developer would most likely reach for an object-relational mapper (ORM) to deal with loading and saving the object graph to the database. There are many to choose from (Hibernate, NHibernate, and Entity Framework are a few) and they solve the problem of binding object graphs to relational database tables very well. In fact, most ORMs today can generate the DDL needed to create the database objects as well — eliminating the need to write table creation scripts by hand.

At this point, I can hear the blood pressure of many database administrators reading this rising through the roof. With SQL book in hand and years of experience writing stored procedures full of selects and cursors, the story of how a hand tuned stored procedure that returns a sequence of forward-only record sets in a single round trip to the database server is the only way the scalability requirements of the application can be met. I’m not saying that using a stored procedure in this situation is wrong, but making a stored procedure the first tool you pull out the toolbox is very wrong indeed.

Why is it wrong? Creating a stored procedure to read data as the first approach is wrong because it is an optimization. Optimizing components of a system before that particular component has been identified as a bottleneck will lead to increased complexity, and that complexity will breed quickly in the project. And as complexity increases across the project, long term maintainability suffers as the capabilities of the development team are challenged. Yep, you guessed it, the stored procedure first approach is a classic case of premature optimization.

How does using a stored procedure in this way breed complexity? First of all, it establishes a myth that reads are a problem. As functionality is added to the system, developers who have come to believe that any account related reads must be done with a stored procedure else they become responsible for performance inadequacy, create more read procedures. As features continue to be implemented, more data elements are added to the schema, requiring every stored procedure to be updated as the schema changes — creating more work for developers who must now touch features that were complete and tested to ensure they still operate as expected.

The opposite effect of the read myth is that retrieving the entire object graph for an account is so well optimized that it is better to load the entire object and use only the needed data elements rather than create a new read procedure. With an ORM, this is handled very well using projections and fetching strategies. Developers can use the ORM to read a partial object graph, returning on the required data elements and reducing the data movement between the database and application server.

All of this accidental complexity was created based on the superstition that only a stored procedure would be fast enough to support the scalability needs of the product. An optimization that was implemented before a bottleneck was identified.

Considering that most ORMs today are capable of writing very efficient SQL and have dialects specifically tuned for each database platform, the read performance of the ORM is less likely to be a system bottleneck. For example, with Microsoft SQL Server, NHibernate takes advantage of batch queries with ADO.NET to reduce the number of round trips between the database and application servers. The SQL generated is also parameterized, allowing the SQL engine to cache execution plans for better server performance. Given these optimizations have already been done by the ORM, tuning read performance in the database is not likely to create the biggest benefit in system scalability. For example, caching of already loaded objects will likely result in greater overall read performance.

Did I forget to mention that this early decision tightly coupled the product to using a particular database platform? SQL dialects are hardly portable between platforms, so the product now has to decide if it will work with a single platform or create a separate release branch for each database platform supported. The better ORMs support multiple server dialects, including Microsoft SQL Server, Oracle, MySQL, PostgreSQL, and many others.

I said I wouldn’t argue the performance difference between using an ORM and a stored procedure. I will point out, however, that using a stored procedure to tune performance is an optimization for a particular environment and should not be an early choice in system design. Going straight for the stored procedure without considering less complex options is another case where a lot of times, the tool we used yesterday is not always appropriate for a system being designed today.

To Be Continued…

Above I’ve covered a few of the design choices made early in the development of several major products and how that affected the evolution of the product over time as featured were added. I also applied a modern view of how many of the choices we made before all these “great tools” were available are not necessarily bad today. As I get more time, I hope to share a few more stories with you as I undercover them in what has basically become a “career retrospective” for me.

 

Jul 01

Today I was honored for the second time with the Microsoft MVP award. It’s great to be recognized for my efforts in the .NET community over the past year. The next year is already shaping up to be another great one, with upcoming speaking engagements at Dallas TechFest, Devlink (Nashville, TN), St. Louis Day(s, plural) of .NET, and the Heartland Developers Conference in Omaha, NE.

If you are near any of these great events, I hope you are able to attend, learn a few things, and most importantly meet others that are part of the software development community. I also would encourage you to attend a few sessions outside of your regular development platform to get an idea of how other technologies solve the same problems in their own way. The cost to value of all these events is an absolute bargain, and many have early registration discounts that are only good for a limited time, so be sure to get registered to ensure the best price.

I look forward to meeting some of you over the next few months, so if you are at one of my talks or see me in the hall, be sure to introduce yourself and give a shout out.

(word)

May 04

It’s been a while since I’ve posted about MassTransit, the .NET distributed application framework and service bus that Dru Sellers, myself, and several other contributors have been working on for the past 2.5 years. This is mostly due to a pretty heavy schedule in the first quarter (CodeMash, the MVP Summit, Pablo’s Fiesta, the North Dallas .NET User Group, and having an in-ground swimming pool built), along with a lot of exploratory coding on some new features for the framework.

While traveling to community events, it’s amazing to run into folks that are using MassTransit in their applications — particularly ones that I’ve never heard from before. It’s a cool feeling to know that people are finding value in the effort we’ve put into it. Over the past few months, several organizations have been finalizing the testing of new applications built on top of MassTransit, taking advantage of the state-driven saga support (including the new distributor for load balancing saga instances across servers), in preparation for production launches. Several contributors have offered tremendous help with this new functionality, including development of and testing with the distributor support.

It is great to see users of the framework taking an active role in shaping the feature set to meet a more generalized set of requirements. When Dru and I started creating MassTransit, we had a fairly narrow feature set that needed to be implemented. As the use of messaging in our applications expanded, we started to identify new features that would provide additional benefits. Being able to harvest application-specific code into the framework has provided a high level of reuse which helps everyone.

GitHub

The biggest move that we made in the past few months was leaving GoogleCode behind and migrating the project to GitHub. If you have not yet discovered it yet, Git is an amazing distributed version control system that offers tremendous flexibility when working on highly distributed projects, such as open source projects. Combined with the amazing social collaboration that occurs on GitHub, we have merged significantly more features from contributors on GitHub in the past few months than we had previously received the entire time we were hosted at GoogleCode. Many have forked the main project (which is hosted on my GitHub account, and is also used to drive the official builds on the CodeBetter TeamCity server) and made changes that I have merged into the main project.

You can still download a compressed archive of the latest source from GitHub by clicking the download source link on the project source page. You can also download the latest build (or the tagged v1.0 build) from the TeamCity artifacts link. The build runs automatically when code is push to GitHub, so the latest build is always from the latest bits in the master branch.

Major Milestone

On March 1st, we marked a v1.0 release candidate of MassTransit (and the related projects Topshelf and Magnum). With open source projects there is always a syndrome of 0.x, where many projects never reach 1.0 yet are still used in production systems at multiple organizations. Considering the number of organizations using MassTransit (and even Topshelf by itself for hosting Windows services), we decided it was time to mark the release 1.0 and freeze the feature set for that line of the codebase.

Since the 1.0 release candidate, there has been very little active development on the MassTransit codebase. The reason for this is simple, we wanted to allow the framework a little time to soak into the community. There are a lot of features that we want to put into the framework, and several of these are under heavy development outside of the master codebase, but the main feature set for 1.0 was released to allow organizations to go forward with implementations that were waiting on an official release.

Documentation

We have heard you, and we are going to start improving the documentation. We’ve set up a site specific to each project (MassTransit, Topshelf, Magnum) and are going to be harvesting the content from the wiki to create a reference set of documentation for using MassTransit. Hopefully we can take some of the questions from the discussion list and get them into a QA/FAQ section as well.

As an aside, I started this post based on a purse fight that was held on Twitter this morning in regards to OSS activity in the .NET community. It is certainly important as an open source project owner to keep the lines of communication flowing in regards to the project status, new features, and roadmap. I plan to work with Dru over the next few days to get these details laid out on the web site so that we can get feedback from the community.

In the next few weeks, I hope to start detailing out some of the 2.0 features that are planned for MassTransit. There are several exciting features in the pipeline, including an entirely new set of edge components for interfacing with clients connected via Ajax/JSON, WCF, or regular web services. As I get my actors together, I hope to post some details, as well as complete my series on building a service gateway using the new edge components.

Apr 28

I love GitHub, it rocks.

I just found out that you can embed a Gist from GitHub in a blog post with a simple line of script:

I’m curious to see how it looks, along with how it comes across in an RSS feed.

So feel free to ignore this post, unless you are also excited by the concept of having nicely formatted, modifiable code in blog posts.