retrage.github.io

Porting Linux to Nabla Containers

This is an introduction of Linux Kernel Library ported to Nabla Containers.

runnc is an OCI runtime that runs process-level isolated unikernels. It is built on the top of Solo5, a sandbox for unikernels, and several unikernels (MirageOS, IncludeOS, Rumprun) run on it. The original runnc uses Rumprun, a NetBSD based unikernel. However, as Docker is started from Linux, it is needed to have system call level compatibility with Linux. Therefore, I ported Linux Kernel Library (LKL) and musl libc to Solo5 and put together with runnc.

frankenlibc on Solo5

frankenlibc is a set of tools to run Rump unikernels in various environments. It has a fork that ported LKL and some libraries. I used this frankenlibc fork and added Solo5 platform support.

Building frankenlibc

Clone the repository and checkout solo5 branch.

$ git clone https://github.com/retrage/frankenlibc.git
$ cd frankenlibc
$ git checkout solo5

Clone full Solo5 repository to avoid build failure and update submodules.

$ git clone https://github.com/Solo5/solo5.git
$ git submodule update --init

Apply some patches.

$ for file in `find patches/solo5/ -maxdepth 1 -type f` ; do patch -p1 < $file ; done

Finally, run the build script.

$ ./build.sh -k linux notests solo5

You can find libraries and toolchain wrappers in rump directory after building successfully.

Testing

Even if notests specified, build.sh builds simple tests to rumpobj/tests.

Create a tap100 tap device.

$ sudo ip tuntap add tap100 mode tap
$ sudo ip addr add 10.0.0.1/24 dev tap100
$ sudo ip link set dev tap100 up

Create disk.img disk image. As LKL/frankenlibc creates directories on initialization, some operations fail if read-only ISO image is used. To avoid this issue, we use the Ext4 file system image.

$ dd if=/dev/zero of=disk.img bs=1024 count=20480
$ mkfs.ext4 -F disk.img

Note that Solo5 requires an application manifest on build time, which is embedded in a unikernel binary. In current frankenlibc Solo5 support, the manifest is common across binaries and specifies rootfs block device and tap network device. We have to provide these devices even not used in the applications.

Run hello test.

$ RUMP_VERBOSE=1 ./rump/bin/rexec rumpobj/tests/hello rootfs:disk.img tap:tap100

In the Linux platform, rexec provides a sandbox environment for unikernels using seccomp like Solo5’s tenders. In the Solo5 platform, it is just a shell script wrapper for spt tender.

LKL Nabla Containers

Now, it’s time to integrate with Nabla Containers. Since the original runnc imports older version of Solo5, I updated it and adapted the runnc code base.

Updating Supplied Arguments

Below is the original code that creates arguments for Solo5 tender.

    var args []string
    if mac != "" {
        args = []string{r.NablaRunBin,
            "--x-exec-heap",
            "--mem=" + strconv.FormatInt(r.Memory, 10),
            "--net-mac=" + mac,
            "--net=" + r.Tap,
            "--disk=" + disk,
            r.UniKernelBin,
            unikernelArgs}
    } else {
        args = []string{r.NablaRunBin,
            "--x-exec-heap",
            "--mem=" + strconv.FormatInt(r.Memory, 10),
            "--net=" + r.Tap,
            "--disk=" + disk,
            r.UniKernelBin,
            unikernelArgs}
    }

In the latest Solo5 (frankenlibc Solo5 platform uses), --net-mac option is removed and we can specify multiple block devices and network devices with --block: and --net: options. Ideally, it should support multiple devices. However, as described before, it can specify rootfs and tap only. So, the port ends up with the support of these devices like this.

    var args []string
    args = []string{r.NablaRunBin,
        "--mem=" + strconv.FormatInt(r.Memory, 10),
        "--net:tap=" + r.Tap,
        "--block:rootfs=" + disk,
        r.UniKernelBin}

Creating Disk Image

I added CreateExt4() function and llmodules/fs/ext4_storage.go to create Ext4 rootfs.

// CreateExt4 creates ext4 raw disk image from the dir argument
func CreateExt4(dir string, target *string) (string, error) {
    var fname string

    if target == nil {
        f, err := ioutil.TempFile("/tmp", "nabla")
        if err != nil {
            return "", err
        }

        fname = f.Name()
        if err := f.Close(); err != nil {
            return "", err
        }
    } else {
        var err error
        fname, err = filepath.Abs(*target)
        if err != nil {
            return "", errors.Wrap(err, "Unable to resolve abs target path")
        }
    }

    absDir, err := filepath.Abs(dir)
    if err != nil {
        return "", errors.Wrap(err, "Unable to resolve abs dir path")
    }

    cmd := exec.Command("virt-make-fs", "-F", "raw", "-t", "ext4",
        absDir, fname)
    err = cmd.Run()
    if err != nil {
        return "", errors.Wrap(err, "Unable to run virt-make-fs command")
    }

    return fname, nil
}

virt-make-fs, a part of libguestfs has similar interface with genisoimage.

It would be better to switch NewISOFsHandler() and NewExt4FsHandler() on run time.

Building and Installing runnc

Same as original.

$ git clone https://github.com/retrage/runnc.git
$ mkdir -p $GOPATH/github.com/retrage
$ ln -sf $PWD/runnc $GOPATH/github.com/retrage/runnc
$ cd runnc
$ git apply patches/0001-solo5-elf-segment-align-workaround.patch
$ make build
$ make install

Testing with Docker Images

I provided a set of Makefiles build LKL Nabla Container base Docker images. It builds Solo5 and frankenlibc, and Docker images.

I also pushed pre-built Docker images to Docker Hub.

You can use images like this.

$ sudo docker run --rm --runtime=runnc retrage/lkl-nabla-python3-base:
latest -c "print(\'hello\')"
[sudo] password for akira:
nabla-run arg [/opt/runnc/bin/nabla-run --mem=512 --net:tap=tap28157ba5950e --bl
ock:rootfs=/var/run/docker/runtime-runnc/moby/28157ba5950e3e84824bd843fd1dafb06eccc7de2020a0619d6a5b463e5f2c2b/rootfs.img /var/lib/docker/overlay2/3d36c19950e53eefded8e1933f3d7e51990fc4c7b065be6c00776eeab8fb3136/merged/python3.nabla __RUMP_FDINFO_NET_tap=4 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin HOSTNAME=28157ba5950e PYTHONHASHSEED=1 PYTHONHOME=/usr/local HOME=/ -- -c print(\'hello\')]
            |      ___|
  __|  _ \  |  _ \ __ \
\__ \ (   | | (   |  ) |
____/\___/ _|\___/____/
Solo5: Bindings version v0.6.4-6-g756accf-dirty
Solo5: Memory map: 512 MB addressable:
Solo5:   reserved @ (0x0 - 0xfffff)
Solo5:       text @ (0x100000 - 0x889fff)
Solo5:     rodata @ (0x88a000 - 0xb4cfff)
Solo5:       data @ (0xb4d000 - 0xe7dfff)
Solo5:       heap >= 0xe7e000 < stack < 0x20000000
sleeping 50000 usec
hello
Solo5: solo5_exit(0) called

Conclusion

In this post, I introduced a brief of LKL Nabla Containers. It is still in an early stage and has room for improvement, but already runs practical applications like Python. I would like to measure the performance and evaluate the pros/cons.

Below is the TODO list:

  • Replace workaround for Solo5
  • Flexible manifest.json handling on build time
  • Pass lkl.json through run time arguments
  • Do not pass __RUMP_FDINFO_NET_tap=4 environment variable on run time

Update: May 1st, 2020

After wrote this post, I found that LKL must use network information created by the container runtime. Otherwise, the network does not work properly. I added the 3rd feature described in the above TODO list to frankenlibc and runnc.

The OCI runtime builds and passes JSON config for LKL at startup. LKL parses it along with environment variables and arguments.

Now, popular network applications Nginx and redis work on LKL Nabla Containers. They are available as base Docker Images.