Distributed builds of LLVM with CMake, recc, and NativeLink

Fast Builds Rock

Let’s face it: Slow builds are a PITA! Many of us in the LLVM community have spent a lot of time optimizing builds, analyzing transitive header includes, working on modules, working on PCH, tweaking shared library builds, on and on and on in search of more reasonable build times. However, the quest is Sisyphean and unending.

Compile time is elastic and usually not monitored. Your team will add code and header bloat until it hurts, and if you have enough resources, they’ll form a subteam to own the problem, and they won’t have any power to lean out your messy include graph, which is the real driver of quadratic compile time. They’ll just build more sophisticated infrastructure to hide the problem. In some ways, this is progress, people usually don’t like thinking about dependency graphs, but I digress. The point is, this post is my personal escalation in a long, never-ending war on compile time, that is only more important if you think it’s important to keep the LLVM development feedback loop tight.

Since leaving Google, I’ve been really fixated on the idea of distributed builds, so I spent some time evaluating the state of the landscape, trying to figure out if there’s something here the LLVM community can use. In the end, my cluster was able to build LLVM four times faster than my laptop, going from 1026s to 253s. Speed depends a lot on hardware, but read on if you want to learn more about how this worked.

Personal History

Working at Google, I was very aware of its internal distributed build system, which solves this problem by distributing build actions at massive scale. The system was partially open sourced as Bazel. I say “partially” open source, because Bazel was a client for a protocol with no reference implementation. The internal distributed build service relied on too much internal Google infra, aka Borg, the predecessor to Kubernetes.

Chromium had its own parallel distributed compilation system called goma, which my team borrowed for LLVM development, and it worked very well for our purposes (-j400-ish builds), until it was replaced with reclient. The build team wanted to make Chrome a customer, and reclient was their offering, but it didn’t work for LLVM.

The key blocker for reclient was absolute and relative paths. In remotely executed distributed builds, you want to avoid actions depending on the current working directory to enable sharing the action cache between builds in different directories. Julio Merino has several good posts on the action cache if you want to read more. This makes sense, but previously, goma would do a lot of the work of interpreting the command line and massaging it into a form that works for remote execution. rewrapper was a dumb client: it did not parse command lines, it simply used some logic to forward inputs and execute the action remotely. This is nice in principle, but not suited to operating in a hostile environment, i.e. CMake.

CMake had a flag called CMAKE_USE_RELATIVE_PATHS that held out some promise that this could be made to work, but it was removed in CMake 3.4. If you were not aware, LLVM has another self-supported build system for gn, Chromium’s build system. For a time, we were able to use this with reclient, rewrapper, but in reality, workstations kept getting faster, Google rolled out mega-size VMs in the cloud for development, so this stopped being important to me and my team.

Then I left Google, and now I’m back to banging rocks together. No 96 vCPU VMs for me, at least not today, hence my renewed interest in these systems.

The Bazel Option

You may be aware that LLVM has a Bazel build system that is officially unsupported but maintained by a subset of the community with some help from automated tools. Bazel speaks the remote execution protocol, so does that allow fast builds? Initially, Bazel had no RE implementation, but eventually several arose, including NativeLink, which I’ll cover later. However, I actually don’t plan to use or promote Bazel.

Bazel is super heavy-weight, and I just don’t like the ergonomics for local builds. The Java server eats a lot of memory. Julio Merino is right, it needs a Rust rewrite. I’m sure Java can be fast, but this is really a space where a performance engineer should come in and tune the data layout with a real systems language.

If Rust-Bazel existed, I think it would be a much easier sell to get projects like LLVM to support and maintain Starlark BUILD files. Developers would have two options: a lean and fast local build system, and the option to trade up for a fancy build-in-the-cloud system, which I believe the Bazel folks are pursuing. I’ll keep an eye on the sky for this development, but in the meantime, I’ll keep using the OSS industry standard tools, even if they are painful.

Kubernetes cluster

Personal Cluster

When I left Google, I wanted to take some time to meet LLVM where it is, and to think about what a distributed build system that works for LLVM would look like. I also put a lot of time into spinning up a Kubernetes cluster, pictured right. These aren’t the most powerful machines, and they mostly act as control nodes. I’ve joined some not-pictured nodes to the cluster to give it some more power for this demo. The point here is to prove the concept, so others can copy the idea with beefier hardware. The cluster uses Talos Linux. The cluster deserves its own dedicated post, so we’ll leave it out of scope, even if it is cute.

Remote Execution Service Landscape

When Google open-sourced Bazel, there were some nascent plans to sell Remote Execution as a service in Cloud. However, priorities changed, the build team left Cloud, and all we were left with was the client and the protocol. Later, some Bazel experts eventually decided to leave Google and found EngFlow, which is a company that sells proprietary remote execution as a service. I’m sure it’s a great product, but for my purposes, if it’s not FOSS, that’s not something I can promote as a real solution in the LLVM community, so I’ve limited my search to open-source remote execution implementations.

Fast forward almost a decade, and now there are multiple reimplementations of the Bazel Remote Execution protocol. The Bazel community documents a list of remote execution service implementations, and these are the self-service / OSS ones:

Honestly, I decided to evaluate NativeLink because they advertised at OSSNA ‘25 in Denver when I attended. I followed my curiosity, and got in touch with Marcus Eagan, the creator and founder. I didn’t do an exhaustive competitive analysis of these solutions, but just based on the Rust language choice, I’m inclined to assume it has good performance. The history docs say, “This project was first created due to frustration with similar projects not working or being extremely inefficient,” which suggests it is a second-system that strives to beat the competition.

At the end of the day, performance is primarily determined by the compute resources you’ve rented or purchased to run the actions, and I’m working with scraps here, so don’t read too much into the performance of NativeLink from this experiment.

Note that NativeLink uses the Functional Source License, which places restrictions on commercial usage. I am not a lawyer, but I believe my usage of the code is permitted under the license, since I’m mainly trying to build a demo.

It took me a lot of fooling around with Claude, kubectl, manifests, fits and starts, Garage, S3, and other nonsense, before I got a working setup, but I made it through. In the end, I used a very simple configuration with a scheduler pod, a cas pod, a worker pod for every node, and a filesystem cache on one of my nodes, scrapping the unneeded S3 storage layer. The Kubernetes NativeLink on prem getting started docs use kind and FluxCD (IIRC), which I found to be pretty clunky, so I skipped that.

The key thing to make this work is to create a dev container for your build, so I’ll cover that next.

Development Containers

Dev containers contain all the system dependencies and build tools that your project depends on, i.e. your compiler, your standard library, cmake, whatever.

I started by copying the dependencies and binaries from LLVM ci-ubuntu containerfile, since that’s what’s tested, and it has an LTO / Bolt optimized LLVM toolchain. NativeLink itself is a single, statically-linked binary, so you can take the regular worker image and simply inject it into your new image wherever you like, say /usr/local/bin/nativelink. I published my Containerfile on GitHub if you want to read it in full. Here are some fragments to give you a sense of how it works:

 1# Extract nativelink binary from the Nix store.
 2FROM ghcr.io/tracemachina/nativelink:v1.0.0 AS nativelink
 3FROM busybox AS extract-nativelink
 4COPY --from=nativelink / /nativelink-image
 5RUN cp /nativelink-image/nix/store/*/bin/nativelink /nativelink
 6...
 7
 8# Copy the pre-built LLVM sysroot (/opt/llvm) from the CI image.
 9FROM ghcr.io/llvm/ci-ubuntu-24.04:latest AS llvm-toolchain
10COPY --from=llvm-toolchain /opt/llvm /opt/llvm
11...
12
13# Shared base: packages for building recc AND for LLVM development.
14# The -dev packages (for recc) are supersets of the runtime libs, so
15# installing them here avoids a second apt-get update in the final image.
16FROM docker.io/library/ubuntu:24.04 AS base

For my purposes, I chose to have this container serve as both my development workspace container, and the worker container. I could optimize things by stripping recc and Bazel out of the remote worker container and removing nativelink from the dev container, but disk is cheap, relatively speaking, despite the AI storage crunch.

Distrobox

I mentioned dev containers to a teammate at work, and they said that they found them painful, because the container filesystem is ephemeral. It gets deleted when you recreate the container with a fresh image. I’d say that distrobox is the solution to this problem. It makes development containers so much easier to use. The basic idea is that distrobox bind mounts your home directory, matches your existing user ID (same FS permissions), and you run development tools in that context. It’s basically a build chroot from back in the day, all grown up.

You can bind mount /work or some other global path if that’s what you use, too, the important thing is that you try to avoid global modifications to the container image, since those won’t persist. The source files and build artifacts should remain rooted in your shared workspace, so you don’t lose them. If you build static libraries with zero shared library dependencies, they’ll even run outside your dev container, but that’s a separate topic.

One of the rough edges here is that your dev container might not have your favorite command line development tools. You can install tools to ~/bin/ or ~/.local/bin/ and use them in your dev container, or create a custom image layered over the base dev container to add the tools you need.

Here’s a quick example of how you can set up an LLVM development container with distrobox, without any distributed build nonsense:

 1# Build the container image from llvmdev/Containerfile
 2podman build -t registry.reidkleckner.dev/llvmdev:latest llvmdev
 3
 4# Push the image to some registry (mine, or ghcr.io)
 5podman push registry.reidkleckner.dev/llvmdev:latest
 6
 7# Check out some sources, wherever
 8cd ~
 9git clone https://github.com/llvm/llvm-project.git
10
11# Create the container from your custom image
12distrobox create -i registry.reidkleckner.dev/llvmdev:latest llvmdev
13
14# Execute a shell in the llvmdev container.
15distrobox enter llvmdev
16
17# Build as normal:
18mkdir llvm-project/build
19cd llvm-project/build
20cmake -GNinja ../llvm ...
21ninja check-llvm
22
23# You can leave and come back:
24exit                       # Returns to parent shell
25distrobox enter llvmdev    # Return to container

Distrobox tweaks the bash shell prompt to indicate when you’re in the distrobox container so you don’t get confused. I use fish and tide, which show this as well. You can keep the editor outside or inside, your choice, but IDE users will keep them outside to integrate with OS graphics libraries.

Distrobox tide prompt

While I said earlier that I’m not interested in using Bazel to build LLVM, it is the reference implementation of the RE protocol, so it’s the logical tool to use to test the service. I installed bazelisk into the devcontainer on PATH (src).

To make Bazel work as a client you have to define a cc_toolchain, which defines where all the system headers are. This is not the focus of this post, so I will say that Claude figured it out, and the results are in the build_rbe directory with a README file if you want to read more about it. Bazel has a way to auto-generate these paths, given a container, if I recall. You need to place the user.bazelrc from that directory in llvm-project/utils/bazel and build in that directory.

 1❯ distrobox enter llvmdev
 2
 3❯ cd llvm-project/utils/bazel/
 4
 5❯ bazel build -j 96 --config=remote-exec  @llvm-project//clang:clang
 6Starting local Bazel server (8.6.0) and connecting to it...
 7INFO: Invocation ID: 9eef8e99-10c2-464e-a1f2-7652e387c88d
 8INFO: Analyzed target @@+_repo_rules+llvm-project//clang:clang (103 packages loaded, 14938 targets configured).
 9[829 / 5,479] 96 actions, 65 running
10    Compiling llvm/lib/Demangle/ItaniumDemangle.cpp [for tool]; 2s remote, remote-cache
11    Compiling llvm/lib/Demangle/MicrosoftDemangleNodes.cpp [for tool]; 2s remote, remote-cache

Here’s visual evidence it worked:

Bazel remote execution build

Here’s the Talos dashboard, showing the clang processes and 100% CPU usage:

Talos dashboard during clang build

Great! We did it, NativeLink and Bazel are configured. Let’s move on to CMake.

Reclient / Rewrapper

Initially I evaluated REclient, a Bazel build project. This thin command wrapper wraps individual commands and sends them over the wire. However, it doesn’t work with CMake generated build.ninja files because, as I mentioned earlier, CMake uses absolute paths to refer to source inputs.

I came up with the idea of post-processing the CMake output to rewrite the paths to be relative. I asked Claude to do this, and it came up with relativize_build_ninja.sh. Surprisingly, this worked, but it’s a pretty awful solution. I didn’t like the idea of relying on a vibe-coded shell script to post-process my CMake output. That said, the fact that this worked at all made me somewhat annoyed that the CMake folks abandoned the CMAKE_RELATIVE_PATHS feature. It demonstrates that it can be done, and there are a lot of benefits to avoiding absolute paths in your build artifacts.

recc

Somewhere in my distributed build journey, I encountered recc. I hadn’t heard about it before I started this exploration. The project was developed originally by Bloomberg and announced back in 2018. It turns out, this is exactly the kind of quick-and-dirty compiler wrapper process to be the adapter between CMake and NativeLink.

I wasn’t able to find pre-compiled recc packages at the time, so I had to build it from source as part of the llvmdev Containerfile. Long story short, this approach actually works! I built all of LLVM + Clang and got about a 4x speedup on my laptop over WiFi:

1❯ ninja -C build_nore 
2ninja: Entering directory `build_nore'
3[1 processes, 4468/4468 @ 4.4/s : 1026.169s ] Generating ../../bin/llvm-readelf
4
5❯ ninja -C build_recc -j96
6ninja: Entering directory `build_recc'
7[1 processes, 4455/4455 @ 17.3/s : 257.527s ] Generating ../../bin/llvm-readelf

I guess recc is the thing I was looking for, the goma-like adapter that maps from the old world to the new world. My goal in posting this is to share my results so that you can share this post with your team and promote these ideas in your group. :)

One major caveat here is that I had to disable PCH, which helps speed up the build a lot. Another is that my Framework desktop is doing most of the heavy lifting here. The small nodes are just not helping all that much.

 1# Show that precompiled headers are enabled
 2❯ grep PRECOMP reconfig.sh CMakeCache.txt
 3reconfig.sh:        -DCMAKE_DISABLE_PRECOMPILE_HEADERS=OFF \
 4CMakeCache.txt:CMAKE_DISABLE_PRECOMPILE_HEADERS:UNINITIALIZED=OFF
 5
 6❯ ninja -t clean
 7Cleaning... 4473 files.
 8
 9❯ ninja
10[1 processes, 4472/4472 @ 6.5/s : 682.778s ] Generating ../../bin/llvm-readelf

So, yes, if you include the PCH optimization, the distributed build is only 2.65x faster than the local build. Please excuse the overstatement in the introduction paragraph, but I believe there’s no reason we can’t make recc compatible with PCH.

I spent some time trying to make recc compatible with PCH to get more impressive results, but I was not successful. First, Clang embeds absolute paths to headers into the PCH, but I came up with a clever hack of passing -working-directory=/proc/self/cwd to the compiler to work around it. Sadly, this was not enough, since the recc input scanning phase doesn’t recognize the PCH file as an input, so it wouldn’t upload it as an input. However, I’ve timed out on the effort, and so I’m posting this in haste before I move on to preparing my talk for EuroLLVM.

The other key result from this setup is that you get all the benefits of ccache / sccache built into the system. Switching between multiple git branches corresponding to PRs often touches enough files that my next build requires rebuilding the whole project. Content-based caching systems address this very well. In this example, you can see that a clean build with a full cache hit rate takes much less time:

1# In the build_recc directory:
2❯ ninja -j96 ; ninja -t clean ; ninja -j96
3[1 processes, 4468/4468 @ 16.8/s : 266.600s ] Generating ../../bin/llvm-readelf
4Cleaning... 4717 files.
5[1 processes, 4468/4468 @ 48.1/s : 92.885s ] Linking CXX executable bin/c-index-test

This test was performed over a VPN connection back to my home network and it required downloading ~350MiB of object files, so there’s room for improvement.

Why not distcc / icecc?

Both distcc and icecc isolate compilation from the environment by pre-processing on the client. This is not so great because it greatly increases the amount of data you send from the client. Pre-processing a modern C++ file results in megabytes of text, and that all has to get sent over the wire.

My rule of thumb is that the number of pre-processed tokens in a C++ codebase is O(n^2) in the size of the codebase. Assume that header size is a constant fraction of code size, and assume that all headers transitively include a constant fraction of the code, and there’s your O(n^2) scaling factor.

Remote Execution clients use an input scanning strategy, where they use dependency information and language-specific tools to quickly scan local sources to identify all the source inputs that need to be staged. This effectively requires running a preprocessor to track macro state, but you don’t need to generate any output.

Lastly, I think part of what I’m trying to do here is to promote the idea of workflow standardization through containerization to the LLVM community. distcc and icecc are pre-container pieces of software. They work by trying to make the remote execution state match the local state, rather than encouraging a team working together to standardize on a common development environment base. I assume you can get good performance results with icecc, but you’ll probably still be left with a lot of divergence in developer workflows, where the build works when configured just right, but oftentimes fails if you’re not holding it exactly right.

What about siso?

Well, I tried it, but it failed for similar reasons to rewrapper. It really wants to see relative input paths. You can read more results on the repo, but they aren’t very interesting.

Notes on LLMs

I relied heavily on Claude code to pull together this demonstration, but I wrote this whole blog post by hand. It’s meandering and imperfect, but for me that’s kind of part of the point. I’m sure all the LLMs could punch it up with more bold text and emojis or make it more concise so it gets more shares, but it would rob me of the opportunity to learn and better understand what it is that I just did and why it might be valuable to you, me, and my team. I hope you’ve enjoyed this walkthrough of my experiments and found it colorful, even if it’s not straight to the point. Writing clear natural language seems like a skill that is going to be increasingly valuable going forward if LLMs continue developing on their current course, so it’s important to practice.

Conclusion

Developer containers are powerful tools for standardizing developer workflows, and they have the potential to enable remote execution of builds with services like NativeLink and Buildbarn. And, as I’ve shown, you don’t have to use new space-age build tools like Bazel to take advantage of these services. Using recc, you can set up distributed builds today, using your existing build tools. I know from past experience that the experts in build systems will tell you all about the scalability advantages of fully building in the cloud and that’s why you should adopt a new build system, but that’s totally impractical for a distributed open source community like LLVM with diverse build system stakeholders. Gradual migration paths, like recc, exist. If you work in and around LLVM or any other similar large C++ project, I hope this inspires you to build something similar, or get in touch with your local devtools team to set something like this up for your builds.