{laKost}

Distributed Building
my experiences with distcc on llvm

How come

For a reason I had to compile the lldb debugger, for which the complete llvm framework with clang and lldb has to be compiled. I quickly figured out that it was not that easy, because 1. I only had 4 Gb RAM on all of my computers and 2. it took a veeery long time. And because I had to change some code and therefore likely build it more than once, I started to deal with distributed building - and was not disappointed.

Advantages even with a single computer

Distributed building was useful for me even though I just used it with a single computer. Makefiles can be provided with the -j 8 option to start several threads simultaneously, still following the dependencies between the generated files. However, it does not only compile, but also link simultaneously. The resulting binary for lldb has a size of about 900 Mb when built with debug symbols and my RAM was rather small with 4 Gb. Linking several of those big files at the same time definitely exceeds my capacity. Once the operating system starts swapping, the day is over. To wait 4-8 times longer, just because of the linking at the end doesn't make sense. So I searched for a possibility to compile simultaneously but link single threaded.

With distributed building the different steps of the compiling process can be configured particularly. Only this way it was possible for me to build llvm in less than 20 hours on my low-memory computer. (I could have bought new RAM, but distributed building was definitely cooler)

About distcc

distcc stands for distributed compiler and is an open source distributed compiler. It consists of two parts.

The distcc client can be used like a compiler. It takes the same arguments as gcc and therefore gcc can easily be replaced by distcc in Makefiles. The second part is the daemon, which does the actual compiling, and is distributed over the available systems. It receives the source over the network, compiles it and returns the assembly.

Crosscompiling is somehow possible with distcc, but it is recommended to use the same architecture and compiler version on all servers.

Installing and Configuring

distcc can be retrieved from the Debian/Ubuntu packages.

sudo apt-get install distcc

The configuration can be provided

  1. in the global file /etc/distcc/hosts
  2. in the user specific file ~/.distcc/hosts
  3. in the environment variable DISTCC_HOST

It consists of a list of available servers by IP and some additional parameters. An example configuration looks like this:

--randomize
--localslots=1
--localslots_cpp=8
localhost/8
192.168.0.2/8

Using distcc

Now the distcc command can be used as the usual gcc command, it takes the same parameters. To visualize the compiling process the additional program distcc-monitor exists. It shows the slots of all servers by time and indicates by color, whether they are compiling, down- or uploading or idle. Using several servers with many slots makes it look like monitoring a high-performance data center.

Unfortunately most of my projects can be built in less than 5s with a usual single-threaded compiler, so there is no need for a building cluster. sigh

Related


◺ back to overview