9 min read

Go is portable, until it isn't

AB
Ali Ben
Member of Technical Staff
Blog post illustration

We thought Go would give us a single, portable agent binary for every Linux distro. Turns out… not exactly. But also, kind of yes.

This post kicks off a series about the traps we fell into while building a cross-platform server monitoring agent.

First, some theory. simob is our open source server monitoring agent that powers the Simple Observability platform. We like to think of it as a passive sensor, not a long running program or daemon. Because in the real world a passive sensor does not come with a long list of requirements. It’s small, self contained and can fit inside the existing system. That is the same goal we have for simob: a lightweight standalone binary with no requisites or external dependencies.

The same idea also applies to how we wanted to ship it. We wanted a project that you can compile from source on your development machine and run anywhere across your infrastructure. No complicated pipelines. No third party build services. Just a simple build that produces a portable binary.

Why we chose Go

In the observability world, if you're building an agent for metrics and logs, you're probably writing it in Go. Promtail, Telegraf, Grafana Alloy and many others are all written in Go.

And there are good reasons for that. First it’s compiled. A whole class of runtime errors gets caught before you even run the binary.

Then there is the garbage collector. For something that’s constantly ingesting and forwarding data, not having to manage memory is a massive advantage.

The Goroutines are also an excellent abstraction. We knew our agent would need to manage a lot of parallel task: tailing log files, reading from input plugins, and sending data upstream. We could write clear, sequential-looking code for each task and let the runtime handle the concurrency

And of course, because we thought we could compile it for any platform. "Just set GOOS and GOARCH at compile time and you're done"

The simple stuff

Most of the early work was simple. The Go ecosystem is more than a decade old and very rich. For core metrics collection we relied on gopsutil, a Go port of Python’s psutil. It gives you CPU, memory, network and disk metrics with a pretty clean API. It supports a wide range of operating systems and CPU architectures, removing the need for system specific code that we would otherwise have to write ourselves.

When it starts getting hard, the case of journal collector

Things became more complex once users asked for systemd journal log support. Journal logs are not stored in plain text. They use a binary format and live in /var/log/journal or /run/log/journal (depending on whether persistent logging is enabled). The format is structured, indexed and can include inline compression.

We had two options. The first was to write our own parser. The file format is documented and the systemd source is available

Tools like Kaitai Struct could help us generate the parser code. It was not impossible. But it required time and careful reading of both the spec and the real implementation.

"Note that the actual implementation in the systemd codebase is the only ultimately authoritative description of the format, so if this document and the code disagree, the code is right"

— A comforting note from the systemd journal documentation. Nothing says "stable, well-documented binary format" like the docs telling you they might be wrong.

Our real concern was compatibility. We wanted a binary that works everywhere. That means support for past, current and future version of the journal format. We did not want to spend time maintaining a backward compatible parser or doing code archaeology. So this option was discarded.

The second option was to use the C API provided by systemd for reading the journal. A Go wrapper already exists. It exposes the journald C API directly. On paper this looked like the right solution, so this is what we chose.

Once we started using it, Go added some constraints. Because the wrapper calls the C API directly, the systemd library is dynamically linked. It must be present on the target machine at runtime. That part is fine. A machine without systemd has no journal logs to collect anyway. It does, however, introduce new build problems.

The first problem is that the build breaks on non systemd systems such as macOS. Since libsystemd is not available, you cannot build from or cross compile to Linux. You must build from a Linux system.

This affects both release builds and development builds. You cannot even run go run locally on a non systemd machine because the compiler cannot find the systemd library. Thankfully Go has build tags to tell the compiler what to include on each platform.


  //go:build linux
            

This line instructs the Go compiler to only build this file on Linux systems

It does add some code bloat, since a stub file is required for other systems so the package still compiles.


    // myfunc_linux.go
    //go:build linux

    package mypkg

    func MyFunc() string {
      // real Linux implementation
    }

    // myfunc_stub.go
    //go:build !linux

    package mypkg

    func MyFunc() string {
      // "stub for other systems"
    }
            

Separate files with build tags let you provide a real implementation for Linux while keeping a stub so the package still compiles elsewhere.

The second problem is that libsystemd differs between architectures. You need an amd64 version to build an amd64 binary and an arm64 version to build an arm64 binary. You cannot simply set GOARCH to produce every target from one worker. Each architecture build must run on a worker that has the matching libsystemd.

The glibc problem

There is another issue that shows up and is much harder to spot at first.

Go has a build flag called CGO_ENABLED. When it is enabled, the Go compiler links any C dependencies dynamically. This includes explicit C wrappers, like the sdjournal package, but also indirect calls inside the Go standard library. A common example is DNS resolution, which relies on glibc on Linux systems. With CGO_ENABLED set to 1, the final binary links to libc at runtime.

The default value depends on the environment. It is enabled by default when building natively on a system that supports cgo. It is disabled when cross compiling or when the C compiler is not available on the PATH. These defaults usually make sense. You generally do not want to enable cgo for cross compilation or for targets where glibc does not exist, such as Windows.

The problem is that a dynamically linked libc does not work on all Linux systems. Some Linux distributions do not use glibc. Mainly Alpine Linux, that uses musl. This means a binary built for a Linux system with CGO_ENABLED will work on Ubuntu or Debian but will fail at runtime on Alpine.


  /bin/sh: ./simob: Permission denied
            

Don't get fooled by the "Permission denied". On Alpine and other musl systems, this error, when permissions are clearly set, almost always means the kernel can't find the required glibc dynamic linker.

This forces you to build a separate version of the agent for non glibc systems.

So, is Go the problem?

Not really. Go behaved exactly as documented. We were the ones assuming that "portable" meant "effortless". Once we pulled in low-level C libraries and tarted targeting a mix of glibc and non-glibc systems, the simple story fell apart. None of it is dramatic, just a set of constraints you only notice once you trip over them.

Our initial idea of building everything on a laptop and shipping the same binary everywhere did not survive for long. We now rely on GitHub Actions with the right runners for each architecture. It is more moving parts than we wanted, but it works and it stays out of the critical path.

Local builds are still possible with containers or emulation, although a bit more clunky than we hoped.

In the end the build pipeline is more complicated than we imagined, but the binaries we ship remain small and self-contained. That was the original goal, and we managed to keep that part intact.