Archive for February, 2011


Writing a peer-peer application

February 27, 2011

After spending more than a year writing a p2p application, I thought it was time for me to recap some of our experiences and summarise some of the essentials.

Why p2p?


Most people have the naive impression that adding more machines or making servers more powerful is enough to achieve scale. Without adequate design, this approach will fail. For example, adding more web servers to a simple web application is limited to the performance of the single database server. If you add more database servers, you have to be prepared for the overhead of data replication. A typical p2p application takes the opposite approach, where it runs on commodity hardware but is designed with the idea of leveraging the power of machines in high quantities. By its very nature, a p2p application is internet-scale.

High Availability/Failover

Due to the large number of available machines in a p2p network, the loss of a single machine (or a cluster of machines in a specific area) isn’t the end of the world. There is rarely any loss of data as it is usually replicated. Other nodes in the network, on detecting the loss of machines, immediately take over additional work until other machines re-enter the network. This is in contrast to most enterprise applications which cannot provide service if connectivity is lost to their data centre!

Distributed compute and resources

There’s been a lot of discussion in recent years about “the wisdom of crowds”. I am not qualified to judge the credibility of the argument. In a p2p network, however, this holds very true. The larger the network, the more the ability of the offered services to function like super-computers!

How p2p?

Programming paradigms

1. Many small services
To take advantage of the resources of the distributed network, it is more efficient to break up complex services into its different components. If specific resources on a node are over-extended, the node can refuse to service requests that further tax those specific resources, while still continuing to accept requests for other services. This allows work to be distributed more uniformly among the many peers.
2. Asynchronous behaviour
Since every invocation of a service could potentially be a request over the network, it is essential to write event-based code. “Waiting” for a response blocks resources on the node unnecessarily. Services are notified by the underlying framework on incoming messages and can continue their processing at that point in time.
3. Multi-threaded behaviour versus Non-blocking code
As a p2p node, there are two ways of handling incoming service requests. The first is the traditional way of receiving a message – spinning up a new thread or process to handle the request and then handling the next message on the queue. However, this leads to complex concurrent code which is never easy to test or maintain. The second, and the approach we’ve taken is to not hand off an incoming request to a worker thread. Instead, since we encourage the development of small services, we handle requests within the same thread with the understanding that any intensive operations will be performed asynchronously (locally or over the p2p network).
4. Retry mechanisms
It is paramount that we always keep in mind that there is a lot of uncertainty in a p2p network – node churn can be quite common, network connectivity can be unstable etc. Since most of the work is done asynchronously, it is hard to keep track of tasks that are still in progress. Timeouts should be identified for each task so that it can be retried if it hasn’t completed in the appropriate time. Care should also be taken that the list of queued tasks is persisted (and replicated) since it is possible for any node or nodes storing this information to lose connectivity with the rest of its peers.
5. Idempotence
It is obvious that nodes will receive identical service requests, especially with the presence of retry mechanisms. Services should either
– recognise that a request is a duplicate of another that it has already processed and respond with the pre-processed result or
– ensure that multiple invocations of the same service with the same parameters result in the same result.
6. Everything is an ID
IDs might be the only static/reliable things in a p2p network. They are usually large and will typically be between 128 and 160 bits long. Each node in the network has an ID and every data element has an ID. Every node “owns” its neighbouring ID space – in other words, an ID belongs to the node whose ID is closest to it. Assuming that we have an uniform distribution of node IDs within the entire range, each node will own an equally large ID space. When nodes join/leave the network, their neighbours automatically determine the new ID range that they own. The ID space is circular – for example, in a 4-bit ID space, the ids 0000 and 1111 are neighbours. Therefore, the p2p network is also referred to as a “ring”.
7. Backwards-compatible services
It is physically impossible to upgrade all nodes in a ring simultaneously. If the network is not in a controlled environment (for example, composed of personal computers belonging to internet users), upgrades can even be spaced out over months or years. It is therefore absolutely essential that p2p applications can ubiquitously deal with messages it receives which contain information it knows nothing about. This is another reason that we decided to model our data as JSON – this allows applications to deserialize only what the current version understands about the data.
8. Shared resources
There are some resources that might have to be shared across nodes in a p2p ring. A common example might be a pool of IP addresses that are exposed as a service provider. If a shared resource is available, one and only node in the ring must “own” it. Nodes could send various messages to each other to advertise about the current status of these resources. Alternatively, they could be managed via the DHT – this second of course gives the added benefit that it is very easy for a human to peek into the ring at any point in time to determine who owns these resources. A word of caution – shared resources resemble a crude notion of locking and is not ideal in a distributed world –  so, use it with care and sparingly!


There are 3 different kinds of messages that can be transmitted in any p2p network. The first is a directed message to a node whereas the other two fit the pub-sub paradigm:
– Point-Point messages are directed from one node to another. The typical mechanism to achieve this is by sending a message to an ID. The node that is closest to the ID will automatically handle the message and service the request.
– Messages can be Published to a topic and all nodes that have subscribed to the same topic will receive the message.
– Messages can be Anycasted to a topic so that at least one node subscribed to the topic will receive the message.

Data storage and retrieval

Data is stored in a Distributed Hash Table
– Everything is a key-value pair.
– Data is replicated to avoid any single point of failure.


Data is usually replicated to a certain number of neighbouring nodes where the number is dependent on the stability of the network. Replication can be done in a myriad of ways depending on the characteristics of the data. We store data by serializing all updates through the node whose ID is closest to the ID of the data entity itself. We call this the “daddy node”. Once the daddy node has updated its local copy of the data, it then notifies its neighbours of the change in state so that the replicas can be updated correspondingly. If the daddy node leaves the ring, one of its neighbours will become the closest node to the ID of the data and becomes the new daddy.

Optimistic concurrency

Handling concurrent updates of data isn’t easy in a DHT. Hence, a number of p2p frameworks will avoid the problem altogether by only storing immutable data. We use optimistic concurrency to work around this issue. When you want to update a data entity, you retrieve the latest version, make your changes to it locally and then try and write it back to the DHT. If some other node sneaks in an update during this process, you get notified by the daddy node that the entity has been modified since you last read it and you attempt the same process all over again. If you still fail after a high number of retries, the update fails with an error and is probably symptomatic of a bug in code or connectivity issues within the ring.


The DHT and the p2p network offer no notions of a transaction. There is no guarantee that a service won’t fail in the middle of a series of operations. In this case, you are reliant on retry mechanisms (with idempotence required for already executed operations) or recovery mechanisms that can undo the already completed work.



Much better, but still losing…

February 27, 2011

Playing white against a higher rated player who is on quite a hot streak, I played well for a long time and the game was quite drawish. Then I missed a chance to get a slight advantage and then misplayed the ending further, allowing my opponent to win quite easily in the end. Replay here.

1. e4 c6 2. d4 d5 3. Nc3 dxe4 4. Nxe4 Nf65. Nxf6+ gxf6 6. Nf3
(6. c3)

6… Bg4 7. Be2 e6 8. h3
(8. O-O)

8… Bh5 9. Be3 Bd6 10. Qd2 Nd7 11. O-O-O Qc7 12. g4 Bg6 13. Bd3 O-O-O 14. Qe2
(14. c4)

14…Kb8 15. Kb1 Nb6 16. Bxg6 hxg6 17. Nd2
(17. c4)

17… f5 18. Nc4 f4 19. Bc1
(19.Nxd6 Qxd6 20. Bd2) (19. Nxb6 Qxb6 20. Bd2)

19… g5 20. Nxb6 Qxb6 21. Qe4! Qb5 22. h4 Qd5
(22… Rxh4 23. Rxh4 gxh4 24. Bxf4 Qd5 25. Qxd5 cxd5 26. Bxd6+ Rxd6 27. Rh1 e5 28. dxe5 Rh6)

23. Qxd5 cxd5 24. h5?!
(24. hxg5 Rhg8 25. Rh5 Rg6 26.Rdh1) The idea here was to get a passed pawn of my own to counter my opponent’s passer on f4. I briefly considered 24. hxg5 but was worried that my doubled g-pawns were too weak and could be rounded up, perhaps remembering the fiasco from my previous game where I blundered with my 24th(!) move with 24… bxa4. This time around it was in fact the better move. Black could still round it up but f4 is now weaker and I could pick up black’s king-side pawns in the process. Whereas hxg5 would have given me a slight advantage, the move played still leaves the game balanced evenly.

24… f5 25. f3
(25. Rdg1 fxg4 26. Rxg4 Be7 27. f3)

25… fxg4 26. fxg4 e5 27. dxe5 Bxe5 28. b3
(28. Rhe1 Bf6 29. Re6 Rhf8 30. c4 d4 31. Kc2)

28… Kc7 29. Ba3 Rd7 30. Bb2
(30. Rhe1)

30… Bxb2 31. Kxb2 Rf8 32. Rd2 f3 33. Rf2 Rf4 34. h6
(34. Rh3 Rxg4 35. Rfxf3 Rh7 36. Rf5 Kd6 37. Rf6+ Ke5 38. Rg6)

34… Rh7 35. Rh5 Kd6 36. Rxg5?
(36. Kc3). I have perhaps played a couple of inaccuracies until this point but the game was still even. My next couple of moves however are enough to give black a winning advantage.

36… Rxh6 37. Rf5?
(37. Kc3)

37… Rf6?!
37…Rxf5! 38. gxf5 Rh3 39. f6 Ke6 40. Kc3 Kxf6 41. Kd4 Kf5 is an easier win.

38. Rxf6+ Rxf6 39. g5 Rf4 40. Kc3 Ke5 41. Kd2 Kf5 42. g6 Ke4 43. Ke1
(43. Rf1 Rg4 44. Re1+ Kf4 (44…Kf5 45. Ke3) 45. Re6)

43… Rg4 44. Rh2 Rxg6 45. Kf2
(45. Rh7)

45… Rg2+ 46.Rxg2 fxg2 47. Kxg2 Kd4 48. Kf2 Kc3 49. Ke3 b5 50. b4 a6 51. Kf4 Kxb4 52. Ke5Kc4 53. a3 a5 54. Kd6 b4 55. a4 d4 56. Kc6 b3 57. cxb3+ Kxb3 58. Kb5 d3 59.Kxa5 d2 60. Kb5 d1=Q 61. a5 Qd8 62. a6 Qb8+ 0-1


A very average showing

February 23, 2011

After the near-debacle in my previous game, I played a lot better this time around, though I still ended up losing! If it wasn’t one tactical oversight, I should have been able to hold out to a draw. Replay here (I was black).

1. e4 c5 2. d4 cxd4 3. c3 d5 4. Qxd4 e6
(4… dxe4 5. Qxe4 (5. Qxd8+ Kxd8))

5. exd5 Qxd5 6. Qxd5 exd5 7. Be3 Nf6 8. Nd2 Nc6 9. h3Be7 10. Ngf3 O-O 11. Be2 Bf5 12. O-O Rfe8 13. Nb3 Bf8 14. Rad1 a5 15. Bb5 Rec8
(15… a4 16. Nbd4 Bd7)

16. Nbd4 Bg6 17. Nh4 Ne5 18. Nxg6 hxg6 19. Nf3 Nc4
(19… Nxf3+ 20. gxf3 a4 21. Bd4 Ra5 22. Bd3)

20. Bxc4 dxc4 21. a4 Bc5 22. Bxc5 Rxc5 23. Rd4 b5 24. Nd2
(24. axb5 Rxb5 25. Rxc4 (25. Rb1 Rc8 26. Nd2 Nd5) 25…Rxb2)

24… bxa4??
24… b4 25. Nxc4 bxc3 26. bxc3 Nd5 27. Rc1 Rb8 28. Kf1 Rb3 should have been good enough for a draw

25. Nxc4 Rac8 26. Nb6 Rb8 27. Nxa4 Re5 28. Rfd1 Re2 29. Kf1 Rbe8 30. Rd8 R2e431. Rxe8+ Rxe8 32. f3 Re5 33. Rd8+ Kh7
(33… Ne8 34. Ra8 Kf8 35. Nb6 Ke7 36.Nc4) (33… Re8 34. Rxe8+ Nxe8 35. Nb6 Nd6)

34. c4 g5 35. c5 Nd5
(35… Rd5 36.Ra8)

36. c6 Ne7 37. c7 Rb5 38. Re8 1-0