Creating a Raspberry Pi Computer Cluster – All the Things That Can Go Wrong

A computer cluster is a set of computers that share a common workload in order to speed up the processing. The idea is to split the work into separate parts that can be processed at the same time – in parallel – on each of the computer nodes.

Recently, I familiarized myself with how to start implementing software on a parallel computer architecture, mainly using the Message Passing Interface (MPI), which is implemented in several free and/or commercial libraries. MPI works as a messenger and a data transfer interface to coordinate what tasks each computer is supposed to carry out.

I started by trying to run my code using the OpenMPI library on my 2 Raspberry Pis, but very quickly run into massive troubles, even though I followed my tutorials rigorously. On that topic, there are a number of excellent tutorials on how to run MPI (see, e.g.,  here and here) so I’m not going to go into that detail much in this article. Instead, I’m listing all the things that are not said in the tutorials, but can easily go wrong.

Some starting points:

  • All of the programs to run need to have same paths and names on each computer.
  • Set up passwordless ssh connection.

Correct MPI Library and Version

1. Make sure that all of your computer nodes are running the same version of the MPI library.

I started with OpenMPI on my 2 Raspberry Pis, but quickly noted that they didn’t work together. This was because one of the Pis was running on Raspbian Jessie, while the other was Stretch. Unluckily, the OpenMPI packages available for these kernel versions didn’t match. One could have built the libraries from source, but forum posts advised against this.

Another MPI library MPICH was available, though, and for a while I tried using that. Turned out, however, that it had other shortcomings (see below), so I went back to OpenMPI.

Networking

2. All of the computing nodes need to see each other with the same IPs (at least for OpenMPI)

This was a major problem. Initially, I wanted to include my 2 laptops and the 2 Pis in the same cluster, but as I was physically at a different location than my Pis, which were behind a router, so this didn’t work so easily.  OpenMPI supports port forwarding if the hostfile is specified (MPICH does not) – one computer per IP, so no way to access more than one computer behind a router. Due to this, I ended up ditching my home Pis altogether and continued with my two laptops.

In practice, you probably don’t have public static IPs for each of your computers, so this means that all of the computer nodes need to be on the same subnetwork.

3. Make sure MPI is using a correct network interface

OpenMPI is seems quite adept at finding the right interface to use, but MPICH couldn’t find my other computer because I had both eth0 and wifi2 interfaces available, and the other node was inside wifi2, not the first, eth0.

4. It may be better to use wired network connections

I had mysterious connection issues with OpenMPI while using wlan connections. These stopped after I changed to wired ethernet connections on all computers. I suspect this had something to do with the VirtualBox’s network routing, which I used on one of my laptops.

Finally…

Setting up a computer cluster can be surprisingly easy – if you do it right the first time. Since Raspberry Pis are cheap and small, it’s straightforward to build a cluster out of them. However, I don’t think they have enough computing power to be of any real parallel computing use. Still, as a test bench for problem scaling, or just to learn about parallel computing, they are great.

Here are a couple of MPI examples I wrote to test some basic functionalities: Two simple examples demonstrating sending and receiving data with MPI, both blocking and non-blocking.

See here for a thorough document about work others have done with Raspberry Pi clusters: Parallel processing with eight-node Raspberry Pi cluster. 

You May Also Like

1 Comment

Leave a Reply

Your email address will not be published.

Note: comments are moderated.