event sourcing is not practical
I've been reading up a bit (hackernews and martinfowler.com) on scaling architectures which focus on immutable/append-only structures that provide for some interesting benefits.
I have always been a huge fan of immutability by default, as it can can provide interesting and meaningful benefits, but I just don't see these scaling models as being practical when applied to large federated systems.
My first counter-argument may come as an ad-hominem attack, but it must be said that I am suspicious of bespoke architectures espoused by developers whose traffic is only moderate, or who run services that don't have many issues with complex writes or transactions in general.
On a more tangible note, I have found that rollback (or replay in the event-source parlance) is not practically realistic. Consider a site that interracts with a credit-card processor. You don't rollback transactions in this scenario - the card processor still has the charge on their books. You produce another forward transaction as an undo mechanism. Martin Fowler sort of hand-waves this problem off with the suggestion that architects create "gateways" to external systems...but what do these really do? The arrow of time is still moving forward at the remote system, and its state is changing too. an isolated replay on your local architecture will no longer be state-consistent with the remote system. Can anyone verify the state of the federated system at this point? I say no.
It strikes me that the only sane way to structure large federated stateful systems is to accept that the arrow of time is only moving forward. Having a backlog of immutable data may be useful for analytical purposes, and it may even be possible to construct large systems that can globally roll-back, but I don't think it is a practical possibility.
I would love to be proven wrong by a large, write-intensive, very-high traffic demonstration though! an interesting counter-example may emerge from spanner, which seems to incorporate some of the principles discussed here, albeit on a "closed" system.
lat updated 2012-10-01