After spending more than a year writing a p2p application, I thought it was time for me to recap some of our experiences and summarise some of the essentials.
Most people have the naive impression that adding more machines or making servers more powerful is enough to achieve scale. Without adequate design, this approach will fail. For example, adding more web servers to a simple web application is limited to the performance of the single database server. If you add more database servers, you have to be prepared for the overhead of data replication. A typical p2p application takes the opposite approach, where it runs on commodity hardware but is designed with the idea of leveraging the power of machines in high quantities. By its very nature, a p2p application is internet-scale.
Due to the large number of available machines in a p2p network, the loss of a single machine (or a cluster of machines in a specific area) isn’t the end of the world. There is rarely any loss of data as it is usually replicated. Other nodes in the network, on detecting the loss of machines, immediately take over additional work until other machines re-enter the network. This is in contrast to most enterprise applications which cannot provide service if connectivity is lost to their data centre!
Distributed compute and resources
There’s been a lot of discussion in recent years about “the wisdom of crowds”. I am not qualified to judge the credibility of the argument. In a p2p network, however, this holds very true. The larger the network, the more the ability of the offered services to function like super-computers!
1. Many small services
To take advantage of the resources of the distributed network, it is more efficient to break up complex services into its different components. If specific resources on a node are over-extended, the node can refuse to service requests that further tax those specific resources, while still continuing to accept requests for other services. This allows work to be distributed more uniformly among the many peers.
2. Asynchronous behaviour
Since every invocation of a service could potentially be a request over the network, it is essential to write event-based code. “Waiting” for a response blocks resources on the node unnecessarily. Services are notified by the underlying framework on incoming messages and can continue their processing at that point in time.
3. Multi-threaded behaviour versus Non-blocking code
As a p2p node, there are two ways of handling incoming service requests. The first is the traditional way of receiving a message – spinning up a new thread or process to handle the request and then handling the next message on the queue. However, this leads to complex concurrent code which is never easy to test or maintain. The second, and the approach we’ve taken is to not hand off an incoming request to a worker thread. Instead, since we encourage the development of small services, we handle requests within the same thread with the understanding that any intensive operations will be performed asynchronously (locally or over the p2p network).
4. Retry mechanisms
It is paramount that we always keep in mind that there is a lot of uncertainty in a p2p network – node churn can be quite common, network connectivity can be unstable etc. Since most of the work is done asynchronously, it is hard to keep track of tasks that are still in progress. Timeouts should be identified for each task so that it can be retried if it hasn’t completed in the appropriate time. Care should also be taken that the list of queued tasks is persisted (and replicated) since it is possible for any node or nodes storing this information to lose connectivity with the rest of its peers.
It is obvious that nodes will receive identical service requests, especially with the presence of retry mechanisms. Services should either
- recognise that a request is a duplicate of another that it has already processed and respond with the pre-processed result or
- ensure that multiple invocations of the same service with the same parameters result in the same result.
6. Everything is an ID
IDs might be the only static/reliable things in a p2p network. They are usually large and will typically be between 128 and 160 bits long. Each node in the network has an ID and every data element has an ID. Every node “owns” its neighbouring ID space – in other words, an ID belongs to the node whose ID is closest to it. Assuming that we have an uniform distribution of node IDs within the entire range, each node will own an equally large ID space. When nodes join/leave the network, their neighbours automatically determine the new ID range that they own. The ID space is circular – for example, in a 4-bit ID space, the ids 0000 and 1111 are neighbours. Therefore, the p2p network is also referred to as a “ring”.
7. Backwards-compatible services
It is physically impossible to upgrade all nodes in a ring simultaneously. If the network is not in a controlled environment (for example, composed of personal computers belonging to internet users), upgrades can even be spaced out over months or years. It is therefore absolutely essential that p2p applications can ubiquitously deal with messages it receives which contain information it knows nothing about. This is another reason that we decided to model our data as JSON – this allows applications to deserialize only what the current version understands about the data.
8. Shared resources
There are some resources that might have to be shared across nodes in a p2p ring. A common example might be a pool of IP addresses that are exposed as a service provider. If a shared resource is available, one and only node in the ring must “own” it. Nodes could send various messages to each other to advertise about the current status of these resources. Alternatively, they could be managed via the DHT – this second of course gives the added benefit that it is very easy for a human to peek into the ring at any point in time to determine who owns these resources. A word of caution – shared resources resemble a crude notion of locking and is not ideal in a distributed world – so, use it with care and sparingly!
There are 3 different kinds of messages that can be transmitted in any p2p network. The first is a directed message to a node whereas the other two fit the pub-sub paradigm:
- Point-Point messages are directed from one node to another. The typical mechanism to achieve this is by sending a message to an ID. The node that is closest to the ID will automatically handle the message and service the request.
- Messages can be Published to a topic and all nodes that have subscribed to the same topic will receive the message.
- Messages can be Anycasted to a topic so that at least one node subscribed to the topic will receive the message.
Data storage and retrieval
Data is stored in a Distributed Hash Table
- Everything is a key-value pair.
- Data is replicated to avoid any single point of failure.
Data is usually replicated to a certain number of neighbouring nodes where the number is dependent on the stability of the network. Replication can be done in a myriad of ways depending on the characteristics of the data. We store data by serializing all updates through the node whose ID is closest to the ID of the data entity itself. We call this the “daddy node”. Once the daddy node has updated its local copy of the data, it then notifies its neighbours of the change in state so that the replicas can be updated correspondingly. If the daddy node leaves the ring, one of its neighbours will become the closest node to the ID of the data and becomes the new daddy.
Handling concurrent updates of data isn’t easy in a DHT. Hence, a number of p2p frameworks will avoid the problem altogether by only storing immutable data. We use optimistic concurrency to work around this issue. When you want to update a data entity, you retrieve the latest version, make your changes to it locally and then try and write it back to the DHT. If some other node sneaks in an update during this process, you get notified by the daddy node that the entity has been modified since you last read it and you attempt the same process all over again. If you still fail after a high number of retries, the update fails with an error and is probably symptomatic of a bug in code or connectivity issues within the ring.
The DHT and the p2p network offer no notions of a transaction. There is no guarantee that a service won’t fail in the middle of a series of operations. In this case, you are reliant on retry mechanisms (with idempotence required for already executed operations) or recovery mechanisms that can undo the already completed work.