LKL.js: Running Linux Kernel on JavaScript *Directly*
I ported Linux kernel directly on JavaScript. In other words, I translated the Linux kernel to JavaScript using Emscripten, and Unlike JSLinux, it runs without emulators.
The following is the working repository.
I published a demonstration site for LKL.js. Please enable SharedArrayBuffer and try it out
I also published slides about LKL.js.
Linux Kernel Library (LKL)
We use Linux Kernel Library (LKL) which makes the Linux kernel an anykernel.
LKL is a fork of torvalds/linux.
It is designed to put LKL specific code only in
arch/lkl
and runs without modifications of other code.
By this design, it makes easy to follow the mainline. (Currently v4.16)
Since LKL is anykernel, it runs on user space of
various OS such as Linux, FreeBSD, and Windows etc.
Emscripten
Emscripten is LLVM based C/C++ to JavaScript/WebAssembly transpiler. It also provides a Unix-like environment to run translated software on web browsers.
Can we port LKL to JavaScript with Emscripten?
LKL runs on various OSes, Emscripten provides Unix-like environment. So can LKL be ported to JavaScript with Emscripten?
Current Status of Linux Kernel Build with Clang
First of all, the Linux kernel is deeply dependent on gcc-extension, and there is a doubt that Clang can not compile it. Once upon the time, there was LLVMLinux project that aims to compile Linux kernel with Clang. However, through the efforts of the Google Android team, two LTS (4.4 and 4.9) can be built with Clang. Now, LKL can be built with Clang.
LKL Build Flow 101
Let’s look at the build flow of LKL. First, when
$ make -C tools/lkl
is performed, the build system determines which source code (*.c/*.S)
to be built from the Kconfig settings and compiles them.
Object files (*.o) generated by compiling are once archived by ar
to built-in.o
.
Next, it links all built-in.o
files into vmlinux
at once.
For host side code, files under tools/lkl/lib
compiled and linked
to liblkl.o
.
Finally, link all files (vmlinux
and liblkl.o
) to liblkl.so
.
This is a simple build flow of LKL.
Porting LKL with Emscripten
Next, we will take a look at how to port LKL with Emscripten.
Not limited to Emscripten, when using LLVM infrastructures, the compiler compiles source to target with the following flow.
Soruce -> LLVM IR -> Target
In this way, the source is once converted to LLVM IR (*.bc/*.ll) and then converted to the target. In Emscripten, the “linking” is the conversion from LLVM IR to JavaScript. Therefore, it is necessary to first convert all (including libc etc. provided by Emscripten) to LLVM IR.
Generating vmlinux.bc
The build using emcc
(An Emscripten Clang wrapper) is:
make -C tools/lkl CC="$CC $CFLAGS" AR="$PY $PWD/ar.py" V=1
The two important things here are $FCFLAS
and ar.py
.
I will explain each one.
(Note that C="$CC $CFLAGS"
is forced to pass $CFLAGS
)
$CFLAGS
is:
CFLAGS="$CFLAGS -s WASM=0"
CFLAGS="$CFLAGS -s ASYNCIFY=1"
CFLAGS="$CFLAGS -s EMULATE_FUNCTION_POINTER_CASTS=1"
CFLAGS="$CFLAGS -s USE_PTHREADS=1"
CFLAGS="$CFLAGS -s PTHREAD_POOL_SIZE=4"
CFLAGS="$CFLAGS -s TOTAL_MEMORY=1342177280"
The options are to pass to Emscripten. Please refer to the Emscripten manual for details.
Furthermore, the following definitions are specified.
CFLAGS="$CFLAGS -DMAX_NR_ZONES=2"
CFLAGS="$CFLAGS -DNR_PAGEFLAGS=20"
CFLAGS="$CFLAGS -DSPINLOCK_SIZE=0"
CFLAGS="$CFLAGS -DF_GETLK64=12"
CFLAGS="$CFLAGS -DF_SETLK64=13"
CFLAGS="$CFLAGS -DF_SETLKW64=14"
These values are originally obtained by compiling an empty file at the time of Linux kernel build. However, this time they can not be obtained directly. Therefore, we have to specify these values which come from when building with the x86_64 environment.
Next, I will explain ar.py
. The following is a snippet of ar.py
.
filename = "objs"
def main():
if not os.path.exists(filename):
with open(filename, "w") as fp:
pass
objs = []
for i, arg in enumerate(sys.argv):
if ".o" in arg and not "built-in" in arg and i > 2:
objs.append(arg)
with open(filename, "aw") as fp:
for obj in objs:
if not obj is "":
fp.write(obj + " ")
return 0
As explained above, the build system of Linux kernel gathers object files
by ar
and links them to get vmlinux
.
To work with Emscripten we need to get vmlinux
as a LLVM bitcode.
LLVM has a linker called llvm-link
that links multiple LLVM bitcode files
to get one LLVM bitcode.
To generate vmlinux.bc
, we need to use llvm-link
,
but there is a problem.
llvm-link
can not take archive files as arguments like ld
s.
Therefore, we have to record object files that are originally archived.
In this case, ar.py
will record them as file paths in objs
.
Next, Let’s look at the part of vmlinux.bc
generation.
I added following scripts to scripts/link-vmlinux.sh
.
info CLEAN obj
python "${srctree}/clean-obj.py"
info GEN link-vmlinux.sh
python "${srctree}/link-vmlinux-gen.py"
info LINK vmlinux
bash "${srctree}/link-vmlinux.sh"
clean-obj.py
removes duplicated file paths from objs
which is generated by ar.py
.
link-vmlinux-gen.py
generates vmlinux-link.sh
(not scripts/link-vmlinux.sh
) which performs llvm-link
.
By performing vmlinux-link.sh
, we can get vmlinux.bc
.
This is the flow of generating vmlinux.bc
.
Generating boot.js
Next, I will look at until JavaScript code is generated.
As explained above, since LKL is one of Library OS,
vmlinux
does not work on its own, it works only when it has
a part of an application. In this case, our target is tools/lkl/tests/boot
which is LKL’s Hello, world
.
$LINK -o $LKL/tests/boot.bc \
$LKL/tests/boot-in.o $LKL/lib/liblkl-in.o $LKL/lib/lkl.o
First, we have to link vmlinux.bc
($LKL/lib/lkl.o
),
host dependent part $LKL/lib/liblkl-in.o
and
applicatin part $LKL/tests/boot-in.o
and get $LKL/tests/boot.bc
.
$DIS -o $LKL/tests/boot.ll $LKL/tests/boot.bc
$CP ~/.emscripten_cache/asmjs/dlmalloc.bc js/dlmalloc.bc
$CP ~/.emscripten_cache/asmjs/libc.bc js/libc.bc
$CP ~/.emscripten_cache/asmjs/pthreads.bc js/pthreads.bc
$DIS -o js/dlmalloc.ll js/dlmalloc.bc
$DIS -o js/libc.ll js/libc.bc
$DIS -o js/pthreads.ll js/pthreads.bc
$PY rename_symbols.py $LKL/tests/boot.ll $LKL/tests/boot-mod.ll
First, it disassembles all LLVM bitcode files
($LKL/tests/boot.bc
and libc.bc
etc.)
using llvm-dis
.
Next, it applies rename_symbols.py
to boot.ll
.
There is a reason for performing such operations.
This is because function names used in the Linux kernel conflict
with function names used in libcs.
In normal LKL, this conflict is avoided by using ELF linker tricks.
Meanwhile, since JavaScript generated by Emscripten does not
have a namespace, such collisions occur.
Therefore, by rewriting the functions names that would collide
with rename_symbols.py
, it can avoid collisions.
In addition, rename_symbols.py
also performs operations such as
converting inline assemblies in Linux kernel to Emscripten
emscripten_asm_const_int
.
From the boot-mod.ll
,
EMCC_DEBUG=1 $CC -o js/boot.html $LKL/tests/boot-mod.ll $CFLAGS -v
generate HTML and JavaScript files.
Adding Workarounds
Although we generated the Linux kernel translated in “completely”
JavaScript and the application boot.js
, it will not work as it is.
This is due to the fact that the architecture of computers and
JavaScript is very different. So we have to make some modifications.
Replacing inline assemblies
In the Linux kernel, the architecture-dependent code is basically placed
under arch/$ARCH
, and other code are architecture independent.
However, an empty inline assembly may be inserted so that optimization
by the compiler prevents meaningful code from being lost at compile time.
Here is an example, set_normalized_timespec64
in kernel/time/time.c
:
void set_normalized_timespec64(struct timespec64 *ts, time64_t sec, s64 nsec)
{
while (nsec >= NSEC_PER_SEC) {
/*
* The following asm() prevents the compiler from
* optimising this loop into a modulo operation. See
* also __iter_div_u64_rem() in include/linux/time.h
*/
asm("" : "+rm"(nsec));
nsec -= NSEC_PER_SEC;
++sec;
}
while (nsec < 0) {
asm("" : "+rm"(nsec));
nsec += NSEC_PER_SEC;
--sec;
}
ts->tv_sec = sec;
ts->tv_nsec = nsec;
}
Such Inline assemblies cause a failure to convert from LLVM bitcode to
JavaScript. Therefore, we have to replace inline assemblies such as
asm("" : "+rm"(nsec))
with emcsripten_asm_const_int
which calls JavaScript code from C defined in Emscripten.
Fix early_param
In the Linux kernel, there is early_param
.
This is defined in include/linux/init.h
as follows:
struct obs_kernel_param {
const char *str;
int (*setup_func)(char *);
int early;
};
/* snip */
#define __setup_param(str, unique_id, fn, early) \
static const char __setup_str_##unique_id[] __initconst \
__aligned(1) = str; \
static struct obs_kernel_param __setup_##unique_id \
__used __section(.init.setup) \
__attribute__((aligned((sizeof(long))))) \
= { __setup_str_##unique_id, fn, early }
/* snip */
#define early_param(str, fn) \
__setup_param(str, fn, fn, 1)
early_param
is a macro, taking str
and fn
as arguments,
and obs_kernel_param
structure placed in .init.setup
.
By referring to arch/lkl/kernel/vmlinux.ldS
which is generated
in the build of LKL, we can see that .init.setup
is arranged
between __setup_start
and __setup_end
.
__setup_start = .; KEEP(*(.init.setup)) __setup_end = .;
These symbols will be used in init/main.c
as follows.
Here it compares one of boot parameter (param
) of Linux kernel
with str
of obs_kernel_param
in .init.setup
.
If it matches, it will execute (*setup_func)(char*)
with argument val
.
/* Check for early params. */
static int __init do_early_param(char *param, char *val,
const char *unused, void *arg)
{
const struct obs_kernel_param *p;
for (p = __setup_start; p < __setup_end; p++) {
if ((p->early && parameq(param, p->str)) ||
(strcmp(param, "console") == 0 &&
strcmp(p->str, "earlycon") == 0)
) {
if (p->setup_func(val) != 0)
pr_warn("Malformed early option '%s'\n", param);
}
}
/* We accept everything at this stage. */
return 0;
}
In summary, do_early_param
executes setup_func
registered by
early_param
by referring boot parameters.
However, since it uses ELF symbols, it does not work correctly in JavaScript. For this reason, the function which will be called here is hard coded.
static int __init do_early_param(char *param, char *val,
const char *unused, void *arg)
{
/* XXX: There is a lot of early_param, but hardcode in init/main.c */
const char *early_params[MAX_INIT_ARGS+2] = { "debug", "quiet", "loglevel", NULL, };
int i;
for (i = 0; early_params[i]; i++) {
if (strcmp(param, early_params[i]) == 0 ||
(strcmp(param, "console") == 0 &&
strcmp(early_params[i], "earlycon") == 0)
) {
switch (i) {
case 0: /* debug */
if (debug_kernel(val) != 0)
pr_warn("Malformed early option '%s'\n", param);
break;
case 1: /* quiet */
if (quiet_kernel(val) != 0)
pr_warn("Malformed early option '%s'\n", param);
break;
case 2: /* loglevel */
if (loglevel(val) != 0)
pr_warn("Malformed early option '%s'\n", param);
break;
default:
pr_warn("Unknown early option '%s'\n", param);
}
}
}
/* We accept everything at this stage. */
return 0;
}
Fix initcall
Like early_param
, initcall
which are called in the initialization
manages functions using ELF symbols.
With JavaScript alone, we can not know which function should be called.
Therefore, we have to generate inticall tables from System.map
generated by a normal build of LKL.
with open(sys.argv[1], "r") as fp:
for line in fp:
if SIG in line:
symbol = line[:-1].split(" ")[2]
try:
level = int(symbol[-1])
initcall = symbol[symbol.index(SIG)+len(SIG):len(symbol)-1]
initcalls[level].append(initcall)
except ValueError:
pass
for level, row in enumerate(initcalls):
print("/* initcall{} */".format(level))
print("EM_ASM({")
for initcall in row:
if initcall in blacklist:
print(" /* _"+initcall+"(); */")
else:
print(" _"+initcall+"();")
print("});")
The above is the initcall table generation script.
We hard-code the code to do_initcalls
.
EM_ASM
is an inline assembly that directly calls the JavaScript code
in C.
static void __init do_initcalls(void)
{
/* XXX: initcalls are broken, so hardcode here */
/* initcall0 */
EM_ASM({
_net_ns_init();
});
/* initcall1 */
EM_ASM({
_lkl_console_init();
_wq_sysfs_init();
_ksysfs_init();
/* snip */
});
}
Demonstration and the Results
As described at the top, LKL.js uses pthread, we have to enable SharedArrayBuffer. Although every modern web browsers are shipped with SharedarrayBuffer, it is disabled by default because of Spectre mitigation in Mozilla Firefox. Therefore, please enable it before executing the demo.
The following is the result of start_kernel
.
We can see that it shows dmesg on browsers.
[ 0.000000] Linux version 4.16.0+ (akira@akira-Z270) () #13 Tue Jul 17 23:01:19 JST 2018
[ 0.000000] bootmem address range: 0x675000 - 0x1674000
[ 0.000000] On node 0 totalpages: 4095
[ 0.000000] Normal zone: 36 pages used for memmap
[ 0.000000] Normal zone: 0 pages reserved
[ 0.000000] Normal zone: 4095 pages, LIFO batch:0
[ 0.000000] pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
[ 0.000000] pcpu-alloc: [0] 0
[ 0.000000] Built 1 zonelists, mobility grouping off. Total pages: 4059
[ 0.000000] Kernel command line: mem=16M loglevel=8
[ 0.000000] Parameter is obsolete, ignored
[ 0.000000] Parameter is obsolete, ignored
[ 0.000000] Dentry cache hash table entries: 2048 (order: 1, 8192 bytes)
[ 0.000000] Inode-cache hash table entries: 1024 (order: 0, 4096 bytes)
[ 0.000000] Memory available: 16144k/16380k RAM
[ 0.000000] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[ 0.000000] NR_IRQS: 1024
[ 0.000000] lkl: irqs initialized
[ 0.000000] clocksource: lkl: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[ 0.000100] lkl: time and timers initialized (irq1)
[ 0.001100] pid_max: default: 4096 minimum: 301
[ 0.009400] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes)
[ 0.009900] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes)
[ 0.327100] console [lkl_console0] enabled
[ 0.329600] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[ 0.329700] xor: automatically using best checksumming function 8regs
[ 0.341199] NET: Registered protocol family 16
[ 0.388999] clocksource: Switched to clocksource lkl
[ 0.414100] NET: Registered protocol family 2
[ 0.437700] tcp_listen_portaddr_hash hash table entries: 512 (order: 0, 4096 bytes)
[ 0.438199] TCP established hash table entries: 1024 (order: 0, 4096 bytes)
[ 0.439000] TCP bind hash table entries: 1024 (order: 0, 4096 bytes)
[ 0.439600] TCP: Hash tables configured (established 1024 bind 1024)
[ 0.443200] UDP hash table entries: 256 (order: 0, 4096 bytes)
[ 0.444000] UDP-Lite hash table entries: 256 (order: 0, 4096 bytes)
[ 0.472100] workingset: timestamp_bits=30 max_order=12 bucket_order=0
[ 0.863100] SGI XFS with ACLs, security attributes, no debug enabled
[ 0.923700] jitterentropy: Initialization failed with host not compliant with requirements: 2
[ 0.924599] io scheduler noop registered
[ 0.924900] io scheduler deadline registered
[ 0.933099] io scheduler cfq registered (default)
[ 0.933500] io scheduler kyber registered
[ 1.633500] NET: Registered protocol family 10
[ 1.658400] Segment Routing with IPv6
[ 1.660800] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
[ 1.674200] ------------[ cut here ]------------
[ 1.675500] WARNING: CPU: 0 PID: 0 at arch/lkl/kernel/setup.c:188 (null)
[ 1.675899] Call Trace:
[ 1.676200]
[ 1.676999] ---[ end trace 941dc55fe0966cff ]---
[ 1.684299] Warning: unable to open an initial console.
[ 1.685200] This architecture does not have kernel memory protection.
pthread_join((pthread_t)tid, NULL): No such process
lkl_start_kernel(&lkl_host_ops, "mem=16M loglevel=8") = 0
Limitations
From the above results, we confirmed that Linux kernel was booted directly in JavaScript. However, it just outputted dmesg and it is not suitable for practical use at all. This is because of the following problems:
- It fails to create kernel threads.
- It fails to mount rootfs.
- It fails to execute init (PID 1).
Also, support for pthreads in Emscripten is not good. We extracted semaphore, mutex, and thread from Little Kernel (LK) and add them to LKL as green threads.
We plan to create LKL.js using this green threads.
Summary
We created a Linux kernel fully translated in JavaScript using LKL and Emscripten. It boots the Linux kernel and we confirmed that it shows dmesg. Although the architecture is greatly different between computers and JavaScript, we found that it works somewhat by adding some fixes and workarounds.
Reference
- https://github.com/lkl/linux
- https://github.com/kripken/emscripten
- https://llvm.org/
- https://clang.llvm.org/
- https://wiki.linuxfoundation.org/llvmlinux
- https://lwn.net/Articles/734071/
- http://llvm.org/docs/CommandGuide/llvm-link.html
- https://0xax.gitbooks.io/linux-insides/Concepts/linux-cpu-3.html
- https://github.com/littlekernel/lk