lockless concurrency

A recent problem at work produced an interesting example of lockless concurrency.

Given a list of M files on S3, how should one allow a set of hosts H₁..H_N to move (copy + delete) these files so that no host H_i receives a file already downloaded to another host H_j?

Lacking an atomic native operation on S3 to this end, I first considered a scheme involving lock files on S3 itself, but that quickly proved to be worthless. I was motivated to find a lockless technique, which was fairly straightforward:

Sort the set of hosts H₁..H_N and store this sorted list L on each host. assume that each host is identified in the list by a string representation of its ip address or fully-qualified domain.
Each host H_i will know its order in L as i as well as the length of L as n.
Each host H_i downloads the manifest M of files currently on S3.
For each file file f in M, each host calculates the sum S of the ordinal values of each of the characters of the SHA1 hash of f. The SHA1 hash provides for a good distribution.
The modulus S % n is calculated as v. if v == i, then the file should be copied to H_i and deleted from S3, otherwise this criteria will be met by another host H_j.

With this scheme, hosts can pull files down from S3 without duplicating a file copied to another host. This approach is a simplified variant of a consistent hash, which I implemented in haskell.

last update 2013-04-03