Archive for the ‘work’ Category



December 5, 2015

Its been ages since I blogged. There’s been quite a few things I can probably write about – for example, I should probably write about the chess database I’ve been working on as a side project for most of this year. But that will have to wait for another time – now I want to talk about Scala.

I suppose most things worth talking about Scala have already been written or spoken about. But as someone who’s primarily worked in Java for many years, getting a new job working with a new language full-time is quite the challenge. The ease of inter-op between Scala and Java does, admittedly, make things a lot easier. Not that I’m writing Java in Scala syntax but having the benefit of compensating for a lack of supporting tools and libraries by using the corresponding libraries directly from Java helps – true for all new languages that run on the JVM of course but the cross-language syntax is a lot easier than it is on some other languages like Clojure.

I’ve been doing this for over a month now and these are my highlights:

  • No getters, setters or lines of code dedicated to initialising fields based on constructor arguments – simple and elegant
  • An object-oriented AND functional language at the same time? Sounds weird but it somehow seems to work
  • I’m quite used to lazy evaluations while operating on lists and maps – for example with Clojure. But a method that takes in a list of java Futures as an argument and returns another list of Futures after mapping over the (values that the futures return in the) passed-in list still has me confused. When I mentioned this to a colleague, he suggested that a good learning experience is to take a list of futures and convert it to a future that returns a list. Bonkers!
  • Using Options so that we don’t have to deal with NPEs is a nice idea theoretically but in practice, it seems to bloat the code unnecessarily. It does avoid having to check for null everywhere – I’m still not convinced it works better than the Java approach but perhaps that will change as I continue to work with it.
  • My last point (for now) doesn’t really have anything to do with Scala. I hadn’t known about Swagger until I started here – very nice documentation interface for your REST APIs!

Coding challenge

November 2, 2011

Here’s my brute-force clojure solution to Cedric Beust’s latest coding challenge. The code maintains a count of how many weights each combination can measure before iterating through the map and returning the maximum – which of course turns out to be all 40!

(def total 40)

(defn check
  ([a target]
    (if (= a target)
  ([a b target]
    (if (or (= 1 (check a target)) (= 1 (check b target)) (= (+ a b) target) (= (- a b) target) (= (- b a) target))
  ([a b c target]
    (if (= 0 (+ (check a b target) (check a c target) (check b c target) (check (+ a b) c target) (check (+ a c) b target) (check (+ b c) a target)))
  ([a b c d target]
    (if (= 0 (+ (check a b c target) (check a b d target) (check b c d target) (check (+ a b) c d target) (check (+ a c) b d target) (check (+ a d) b c target) (check (+ b c) a d target) (check (+ b d) a c target) (check (+ c d) a b target)))

(defn numberOfWeights [combo]
  (loop [x 1 sum 0]
    (if (> x total)
      (recur (inc x) (+ sum (check (first combo) (second combo) (nth combo 2) (nth combo 3) x))))))

(defn run 
  ([a b c d result]
    (let [combo [a b c d]] 
      (if (= total (+ a b c d)) 
        (assoc result combo (numberOfWeights combo))
  ([a b c result]
    (let [d (- total (+ a b c))]
      (if (< d 0)
        (run a b c d result))))
  ([a b result]
    (loop [c b r result]
      (if (>= (+ a b c) total)
        (recur (inc c) (run a b c r)))))
  ([a result]
    (loop [b a r result]
      (if (>= (+ a b) (- total 1))
        (recur (inc b) (run a b r)))))
    (loop [a 1 result {}]
      (if (>= a (- total 2))
        (recur (inc a) (run a result))))))

(def main
  (let [result (run)]
    (loop [resultKeys (keys result) maxKey nil maxValue 0]
      (let [candidate (first resultKeys)]
        (if (nil? candidate)
          {maxKey maxValue}
          (if (or (nil? maxKey) (> (get result candidate) maxValue))
            (recur (next resultKeys) candidate (get result candidate))
            (recur (next resultKeys) maxKey maxValue)))))))

Running it on the clojure console gives me:

user=> (load-file "/home/raghav/spikes/clojure/weights.clj")
user=> main
{[3 9 27 1] 40}



Parallel Sudoku

August 15, 2011

Following on from my previous post where I wrote a Sudoku solver in Clojure, I tried to improve the performance of the algorithm by having each branch of my search tree execute in a separate thread. I first tried using the ref construct and while it solved the easy puzzle, there was a non-trivial bug which went into some infinite loop with the hard one. I finally gave up on it and decided to use the agent construct to notify the main thread when the puzzle was solved. The main thread just hangs around waiting for the agent state to change to the right solution.

While performance between the two solutions is interchangeable on the easy puzzle, there is a noticeable difference in the performance on the hard one – the parallel solution is consistently 3.5-4 times slower! I have tried changing around some of the parameters (increasing thread pool size to 1000, decreasing it to 4 – the number of cores on my laptop, increasing the sleep interval on the main thread to try and reduce the amount of processor time it grabs etc), all to no avail. I can’t believe context switching between threads costs so much more than the benefits of a parallel search.

So, why is this slower (and by this magnitude)? Alternatively, how could I profile this application (it takes approximately 4 seconds to run) to get some hints?

Here is the relevant additional code:

(def executors (java.util.concurrent.Executors/newFixedThreadPool 100))

(defn solve-in-parallel-worker [grid counter solution-agent]
  (if (>= counter (* size size)) 
    (send-off solution-agent conj grid)
    (if (grid counter) 
      (solve-in-parallel-worker grid (inc counter) solution-agent)
        [row-number (quot counter size)
        col-number (rem counter size)
        sub-grid-number [(quot row-number sqrt) (quot col-number sqrt)]
              (seq (missing (clojure.set/union 
                              (row grid row-number) 
                              (col grid col-number) 
                              (sub-grid grid (sub-grid-number 0) (sub-grid-number 1))))) 
             dummy nil]
            (if (nil? (first possibles))
                (next possibles) 
                (.submit executors 
                  (fn[] (solve-in-parallel-worker 
                          (assign grid row-number col-number (first possibles)) 
                          (inc counter) 

(defn solve-in-parallel [grid]
  (let [solution-agent (agent [])]
    (.submit executors (fn[] (solve-in-parallel-worker grid 0 solution-agent)))
    (while (= @solution-agent []) 
      (Thread/sleep 1)) 
    (first @solution-agent)))


July 27, 2011

After ages and ages, I finally got into a new language. On the pretext of preparing for a brown bag session with the team, I started reading about Clojure and all of its benefits. The brown bag went off decently enough – I re-used Rich Hickey’s slides and talked about functional programming, Java inter-op and mainly the concurrency mechanisms. While preparing, I copy-pasted several bits of code, ran them etc and started getting a feel for the language. But the penny really dropped at the very end of the session when I was asked to write a simple hello world…and couldn’t! I later found (def hello (fn [] “Hello world”)) on the clojure website and promptly sent it off to the guys 🙂

Since I wasn’t totally put off by the language and even somewhat impressed by it, I decided I needed to do some hands-on development to learn the intricacies of the language. To begin with, I stayed away from all the concurrency constructs and picked a problem to try and tackle the functional elements – a sudoku problem solver. Its most likely not the most efficient algorithm though I might try to parallelize it.

  • My “grid” is actually just a vector of 81 elements.
  • Rows, columns and sub-grids are views of the appropriate indices on the vector.
  • setup-grid sets up the problem.
  • solve uses brute-force in a depth-first manner to solve the puzzle.
Having written no automated tests (lots of manual testing via the REPL while writing the code only), I tested it out with two examples from websudoku. The first was the “easy level” and took 98.684439 msecs whereas the “evil level” took 1139.137917 msecs. Here are the two examples:
3 - 8 2 9 1 - - -
1 7 9 - - - 2 6 -
- - - - - 6 1 - -
8 - 5 1 - - 6 - -
- - - 6 3 9 - - -
- - 6 - - 8 3 - 2
- - 1 5 - - - - -
- 5 4 - - - 8 1 7
- - - 8 1 7 4 - 6
- - 2 - - - - - 6
5 4 - 9 - - 2 - -
- - - 8 2 - - - -
- 5 - - - - 7 - -
- - 4 7 1 6 5 - -
- - 1 - - - - 8 -
- - - - 9 2 - - -
- - 9 - - 7 - 4 8
4 - - - - - 1 - -
And if you are interested, the code: (and yes, I know there are a lot of brackets)
(def size 9)
(def sqrt (Math/sqrt size))
(def values
  (loop [c 1 v (set [])]
    (if (> c size)
      (recur (inc c) (conj v (str c))))))

(defn index [x y]
  (+ (* x size) y))

(defn place [grid x y]
  (-> grid (nth (index x y))))

(defn assign [grid x y val]
  (assoc grid (index x y) (str val)))

(defn setup-base-grid []
  (vec (take (* size size) (cycle [nil]))))

(defn setup-grid [init-setup]
  (loop [grid (setup-base-grid) ks (keys init-setup) vs (vals init-setup)]
    (if (and ks vs)
      (recur (assign grid ((first ks) 0) ((first ks) 1) (first vs)) (next ks) (next vs))

(defn row [grid x]
    [start (* x size)
     end (+ start size)]
    (subvec grid start end)))

(defn col [grid y]
  (loop [c 0 v []]
    (if (>= c size)
      (recur (inc c) (conj v (place grid c y))))))
(defn sub-grid [grid x y]
    [startx (* x sqrt)
     endx (+ startx sqrt)
     starty (* y sqrt)
     endy (+ starty sqrt)]
    (loop [c startx v []]
      (if (>= c endx)
        (vec v)
        (recur (inc c) (concat v (subvec (row grid c) starty endy)))))))

(defn pretty-print [grid]
  (loop [c 0 out ""]
    (if (>= c size)
      (println (.replace (.replace (.replace out "nil" "-") "[" "") "]" ""))
      (recur (inc c) (str out (println-str (row grid c)))))))

(defn missing [section]
  (clojure.set/difference values (set section)))

(defn solve
  ([grid] (solve grid 0))
  ([grid counter]
      (if (>= counter (* size size))
        (if (grid counter)
          (solve grid (inc counter))
            [row-number (quot counter size)
             col-number (rem counter size)
             sub-grid-number [(quot row-number sqrt) (quot col-number sqrt)]]
           (loop [possibles (seq (missing (clojure.set/union (row grid row-number) (col grid col-number) (sub-grid grid (sub-grid-number 0) (sub-grid-number 1)))))]
             (if (nil? (first possibles))
               (let [possible-grid (solve (assign grid row-number col-number (first possibles)) (inc counter))]
                 (if (nil? possible-grid)
                   (recur (next possibles))

Writing a peer-peer application

February 27, 2011

After spending more than a year writing a p2p application, I thought it was time for me to recap some of our experiences and summarise some of the essentials.

Why p2p?


Most people have the naive impression that adding more machines or making servers more powerful is enough to achieve scale. Without adequate design, this approach will fail. For example, adding more web servers to a simple web application is limited to the performance of the single database server. If you add more database servers, you have to be prepared for the overhead of data replication. A typical p2p application takes the opposite approach, where it runs on commodity hardware but is designed with the idea of leveraging the power of machines in high quantities. By its very nature, a p2p application is internet-scale.

High Availability/Failover

Due to the large number of available machines in a p2p network, the loss of a single machine (or a cluster of machines in a specific area) isn’t the end of the world. There is rarely any loss of data as it is usually replicated. Other nodes in the network, on detecting the loss of machines, immediately take over additional work until other machines re-enter the network. This is in contrast to most enterprise applications which cannot provide service if connectivity is lost to their data centre!

Distributed compute and resources

There’s been a lot of discussion in recent years about “the wisdom of crowds”. I am not qualified to judge the credibility of the argument. In a p2p network, however, this holds very true. The larger the network, the more the ability of the offered services to function like super-computers!

How p2p?

Programming paradigms

1. Many small services
To take advantage of the resources of the distributed network, it is more efficient to break up complex services into its different components. If specific resources on a node are over-extended, the node can refuse to service requests that further tax those specific resources, while still continuing to accept requests for other services. This allows work to be distributed more uniformly among the many peers.
2. Asynchronous behaviour
Since every invocation of a service could potentially be a request over the network, it is essential to write event-based code. “Waiting” for a response blocks resources on the node unnecessarily. Services are notified by the underlying framework on incoming messages and can continue their processing at that point in time.
3. Multi-threaded behaviour versus Non-blocking code
As a p2p node, there are two ways of handling incoming service requests. The first is the traditional way of receiving a message – spinning up a new thread or process to handle the request and then handling the next message on the queue. However, this leads to complex concurrent code which is never easy to test or maintain. The second, and the approach we’ve taken is to not hand off an incoming request to a worker thread. Instead, since we encourage the development of small services, we handle requests within the same thread with the understanding that any intensive operations will be performed asynchronously (locally or over the p2p network).
4. Retry mechanisms
It is paramount that we always keep in mind that there is a lot of uncertainty in a p2p network – node churn can be quite common, network connectivity can be unstable etc. Since most of the work is done asynchronously, it is hard to keep track of tasks that are still in progress. Timeouts should be identified for each task so that it can be retried if it hasn’t completed in the appropriate time. Care should also be taken that the list of queued tasks is persisted (and replicated) since it is possible for any node or nodes storing this information to lose connectivity with the rest of its peers.
5. Idempotence
It is obvious that nodes will receive identical service requests, especially with the presence of retry mechanisms. Services should either
– recognise that a request is a duplicate of another that it has already processed and respond with the pre-processed result or
– ensure that multiple invocations of the same service with the same parameters result in the same result.
6. Everything is an ID
IDs might be the only static/reliable things in a p2p network. They are usually large and will typically be between 128 and 160 bits long. Each node in the network has an ID and every data element has an ID. Every node “owns” its neighbouring ID space – in other words, an ID belongs to the node whose ID is closest to it. Assuming that we have an uniform distribution of node IDs within the entire range, each node will own an equally large ID space. When nodes join/leave the network, their neighbours automatically determine the new ID range that they own. The ID space is circular – for example, in a 4-bit ID space, the ids 0000 and 1111 are neighbours. Therefore, the p2p network is also referred to as a “ring”.
7. Backwards-compatible services
It is physically impossible to upgrade all nodes in a ring simultaneously. If the network is not in a controlled environment (for example, composed of personal computers belonging to internet users), upgrades can even be spaced out over months or years. It is therefore absolutely essential that p2p applications can ubiquitously deal with messages it receives which contain information it knows nothing about. This is another reason that we decided to model our data as JSON – this allows applications to deserialize only what the current version understands about the data.
8. Shared resources
There are some resources that might have to be shared across nodes in a p2p ring. A common example might be a pool of IP addresses that are exposed as a service provider. If a shared resource is available, one and only node in the ring must “own” it. Nodes could send various messages to each other to advertise about the current status of these resources. Alternatively, they could be managed via the DHT – this second of course gives the added benefit that it is very easy for a human to peek into the ring at any point in time to determine who owns these resources. A word of caution – shared resources resemble a crude notion of locking and is not ideal in a distributed world –  so, use it with care and sparingly!


There are 3 different kinds of messages that can be transmitted in any p2p network. The first is a directed message to a node whereas the other two fit the pub-sub paradigm:
– Point-Point messages are directed from one node to another. The typical mechanism to achieve this is by sending a message to an ID. The node that is closest to the ID will automatically handle the message and service the request.
– Messages can be Published to a topic and all nodes that have subscribed to the same topic will receive the message.
– Messages can be Anycasted to a topic so that at least one node subscribed to the topic will receive the message.

Data storage and retrieval

Data is stored in a Distributed Hash Table
– Everything is a key-value pair.
– Data is replicated to avoid any single point of failure.


Data is usually replicated to a certain number of neighbouring nodes where the number is dependent on the stability of the network. Replication can be done in a myriad of ways depending on the characteristics of the data. We store data by serializing all updates through the node whose ID is closest to the ID of the data entity itself. We call this the “daddy node”. Once the daddy node has updated its local copy of the data, it then notifies its neighbours of the change in state so that the replicas can be updated correspondingly. If the daddy node leaves the ring, one of its neighbours will become the closest node to the ID of the data and becomes the new daddy.

Optimistic concurrency

Handling concurrent updates of data isn’t easy in a DHT. Hence, a number of p2p frameworks will avoid the problem altogether by only storing immutable data. We use optimistic concurrency to work around this issue. When you want to update a data entity, you retrieve the latest version, make your changes to it locally and then try and write it back to the DHT. If some other node sneaks in an update during this process, you get notified by the daddy node that the entity has been modified since you last read it and you attempt the same process all over again. If you still fail after a high number of retries, the update fails with an error and is probably symptomatic of a bug in code or connectivity issues within the ring.


The DHT and the p2p network offer no notions of a transaction. There is no guarantee that a service won’t fail in the middle of a series of operations. In this case, you are reliant on retry mechanisms (with idempotence required for already executed operations) or recovery mechanisms that can undo the already completed work.



What is your coding style?

February 6, 2010

I wanted to traverse a list of strings until a certain string occurred, then do something with that index. My code was:

int i;
for (i=0; i<list.size() && !("string".equals(list.get(i))); i++);
// do something with i

During the build, where we use several tools and metrics, Checkstyle complained that an empty statement in the for-loop was badness. Why?

This of course led to an argument about whether this is easy code to read. Checkstyle was happier when the above code was refactored to:

int i;
for (i=0; i<list.size(); i++)
  if ("string".equals(list.get(i)))
// do something with i

And another way to write the same code is:

int i;
while (i < list.size() && !(“string”.equals(list.get(i))))
// do something with i

The 1st and 3rd ways are crystal-clear to me and intuitively, the way I write (and think about) code.

Another example I have (and which is perhaps a deeper question about how you approach mathematical problems) is, “How do you compute a to the power of b?”. I would, instinctively, write a recursive method. Others I know would do it iteratively. What is your coding style?


Testing log statements

September 25, 2009

Of late, this blog has become an almost exclusive chess blog. Let me redress that balance, at least for one post.

Unit testing your log statements is unusual, but not unheard of. There might be many reasons for this – you might have a separate process monitoring your log files and taking action on certain log messages…or, you might be handling an event which you choose to ignore except to log its invocation and you want to test that you are logging properly. Whatever the reason, there isn’t any easy way (at least in Java) to test your log statements.

One way of testing is to read from the log file and ensure the log statement gets written out. But we don’t ideally want to be reading from log files in our unit tests. Here’s what you could do, if you were using log4j:

First, have a class called

import java.util.Vector;

import org.apache.log4j.AppenderSkeleton;
import org.apache.log4j.spi.LoggingEvent;

public class VectorAppender extends AppenderSkeleton {
 private static Vector <String>messages = new Vector<String>();
 public static final boolean VECTORAPPENDER_SYSOUT = false;

 protected void append(LoggingEvent arg0) {
   System.out.println(arg0.getLoggerName() + "  - " + arg0.getRenderedMessage());
  messages.add(arg0.getLoggerName() + "  - " + arg0.getRenderedMessage());
 public void close() {}
 public boolean requiresLayout() {
  return false;

 public static void clear() {
  messages = new Vector<String>();
 public static Vector<String> getMessages() {
  return messages;

Then, we invoke initLogging() in the setup to our unit test:

public static void initLogging() {
 Properties p = new Properties();
 p.put("log4j.rootCategory", "DEBUG, stdout, LOGFILE");
 p.put("log4j.appender.LOGFILE", VectorAppender.class.getName());
 p.put("log4j.appender.stdout.Threshold", "DEBUG");
 p.put("log4j.appender.stdout", "org.apache.log4j.ConsoleAppender");
 p.put("log4j.appender.stdout.layout", "org.apache.log4j.PatternLayout");
 p.put("log4j.appender.stdout.layout.ConversionPattern", "%d %-5p [%C{1}.%M() %L] - %m%n");


Now, VectorAppender.getMessages() gives you everything that’s been logged and you can assert to your heart’s content!