Signed-off-by: NotAShelf <raf@notashelf.dev> Change-Id: I885e6317d186ccdc847195957dba4ab26a6a6964
11 KiB
Nixir Technical Specification
This is a distillation of my personal notes on my "research" within the Nix codebase and the subsequent design notes on Nixir. While some of those, naturally, belong in the README I have elected to compile a list of noteworthy details into a "specification document" for those possibly interested, for some reason, in integrating with Nixir.
Beware, here be observations.
What This Project Is
Nixir is, most simply (and elegantly) put, a Nix compiler and runtime packaged as a plugin. The compiler component compiles a subset of Nix source to a custom binary intermediate representation (IR) and then executes IR inside a virtual machine running within the plugin process. Hence it's called Nix-ir.
As you might've caught on from the README already, the project consists of two
artifacts: a standalone compiler tool called nix-irc that transforms .nix
files into .nixir bundles, and a plugin library (nix-ir-plugin.so) that Nix
loads to provide three primops for interacting with compiled IR.
The architecture handles the full compilation pipeline. Static imports are resolved at compile time and inlined into the output bundle, while the compiled VM handles all evaluation at runtime. This mirrors how Nixpkgs itself distinguishes between stable library code and application-specific expressions.
The plugin does not intercept evaluation automatically. Instead, it exposes primops that users invoke explicitly. This design exists because Nix's plugin API does not provide hooks into the core evaluation loop. Unfortunate, but 'tis life.
Why Compile Nix
Every invocation of nix eval or nix build must parse, type-check, and
evaluate expressions from scratch. For large codebases, this overhead is
measurable.
Nix does provide a persistent evaluation cache, stored in SQLite. However, this
cache only applies to flake-based workflows. Direct imports like
import ./foo.nix do not benefit from the cache and re-parse on each
invocation.
For example, a NixOS configuration using direct imports to nixpkgs.lib
re-parses source files on every rebuild. The compiler front-end accounts for
substantial wall-clock time before evaluation begins.
Precompiled IR eliminates, or rather, attempts to eliminate this cost. A
.nixir bundle contains serialized AST nodes with all variable names converted
to numeric indices. Loading skips parsing entirely and begins directly with the
VM executing pre-processed code.
The project also serves as an implementation study. I say also, but it is actually the main goal of this project. Reimplementing Nix's evaluation semantics reveals details that the upstream C++ code obscures. The thunk mechanism, environment model, and cycle detection become tangible when you can read and step through the implementation. I don't expect to get a better understanding of the Nix language, but I now have more reasons to badmouth it.
The IR Format
The binary format uses 36-byte fixed header followed by variable-length sections. All multi-byte integers use little-endian byte order.
The header layout:
0x00-0x03: Magic identifier, value 0x4E495258
0x04-0x07: Version number, currently 2
0x08-0x0B: Flags field, reserved
0x0C-0x0F: Offset to string table
0x10-0x13: Offset to primop table
0x14-0x17: Offset to IR blob
0x18-0x1B: String count
0x1C-0x1F: Primop count
0x20-0x23: Reserved
The magic value 0x4E495258 corresponds to the bytes N I R X when read in
big-endian order.
The string table follows the header. Each entry encodes length as a varint, then that many UTF-8 bytes. All attribute names, identifiers, and string literals in the source are de-duplicated at compile time and stored here. References throughout the IR use indices into this table rather than inline strings.
The primop table defines built-in operations. Each entry contains the string table index for the operation name, its arity, and optional flags. This table enables the VM to dispatch operations by index without string comparison.
The IR blob contains the actual program. Each node begins with a type byte followed by type-specific payload.
Node type enumeration from the source:
0x01: CONST_INT - Signed 64-bit integer
0x02: CONST_STRING - String table index
0x03: CONST_PATH - String table index
0x04: CONST_BOOL - 0x00 or 0x01
0x05: CONST_NULL - No payload
0x06: CONST_FLOAT - IEEE 754 double
0x07: CONST_URI - String table index
0x08: CONST_LOOKUP_PATH - String table index for <nixpkgs>
0x10: VAR - Two varints: depth and index
0x20: LAMBDA - Arity and body offset
0x21: APP - Function and argument offsets
0x22: BINARY_OP - Operation enum and operands
0x23: UNARY_OP - Operation enum and operand
0x24: IMPORT - String table index for file path
0x30: ATTRSET - Count and recursive flag
0x31: SELECT - Expression, attribute, optional default
0x32: WITH - Attribute set and body offsets
0x33: LIST - Count and element offsets
0x34: HAS_ATTR - Expression and attribute
0x40: IF - Condition, then, and else offsets
0x50: LET - Binding count and body offset
0x51: LETREC - Binding count and body offset
0x52: ASSERT - Condition and body offsets
0x60: THUNK - Expression offset
0x61: FORCE - Expression offset
0xFF: ERROR - Error marker
Binary operations supported:
ADD, SUB, MUL, DIV - Arithmetic on integers
CONCAT - List concatenation (++)
EQ, NE - Equality comparison
LT, GT, LE, GE - Ordering comparison
AND, OR, IMPL - Boolean logic
MERGE - Attribute set override (//)
Variable Representation
The compiler converts variable names to De Bruijn indices during IR generation. Rather than storing strings like "x" in the output, each variable reference encodes two numbers: the lexical depth and the position within that scope.
The depth indicates how many lambda boundaries enclose the reference. A variable in the outermost scope has depth zero. A variable referenced from inside one lambda that refers to the outer scope has depth one.
The index indicates the position in that scope's environment array. The first bound variable in a scope has index zero, the second has index one, and so forth.
During evaluation, the VM combines these two numbers into a single 32-bit value where the high 16 bits encode depth and the low 16 bits encode index. Lookup traverses the environment chain depth times, then indexes into the resulting scope's binding array. This achieves O(1) variable resolution.
The Virtual Machine
The VM implements lazy evaluation using an explicit thunk mechanism. Every unevaluated expression and function argument wraps in a Thunk structure containing the expression AST node and a pointer to the captured environment.
When the VM needs a value, it calls force() on the thunk. The force operation
checks whether the thunk is already being evaluated. If evaluation attempts to
force a thunk that is currently evaluating, the VM detects the cycle and raises
"infinite recursion encountered". This matches Nix's behavior for recursive
definitions.
The environment structure is an array-based chain. Each scope holds a pointer to its parent scope and a vector of bound values. Looking up a variable traverses parent pointers until reaching the scope at the correct depth, then indexes into that scope's value array. This replaces string comparison with pointer traversal and array indexing.
Function application follows currying. When applying a function to an argument, the VM checks whether the function's arity is satisfied. If yes, it extends the environment with the new binding and evaluates the body. If not, it returns a partial application awaiting additional arguments.
The evaluator handles binary operations with type-specific dispatch. Addition supports integers, strings, and paths with appropriate type coercion rules. Comparison operators work on integers and strings. The merge operator combines two attribute sets with right-side precedence.
Plugin Primops
The plugin registers three primops through Nix's RegisterPrimOp interface:
__nixIR_loadIR accepts a file path string, deserializes the .nixir bundle,
evaluates the entry expression, and returns the resulting value. The VM measures
deserialization time and evaluation time separately, printing timing data to
stderr.
__nixIR_compile accepts a string containing Nix source code, parses it
in-memory, generates IR, and evaluates the result. This enables runtime
compilation without external tooling.
__nixIR_info returns an attribute set containing the plugin name
"nix-ir-plugin", version "0.1.0", and status "runtime-active". This is a
development-only primop that will be removed eventually.
The primops use the double-underscore prefix internally. Users access them
through builtins.nixIR_loadIR, builtins.nixIR_compile, and
builtins.nixIR_info in their expressions.
Import Handling
The compiler performs static import resolution when the import path meets specific conditions. The path must be a literal string literal in the source, not an interpolation or variable. The path must not use home directory expansion. The resolved path must remain within the project root for security. The target file must exist and be readable at compile time.
When these conditions hold, the compiler reads the imported file, recursively
processes its imports, and embeds the resulting IR into the output bundle. The
final .nixir file is self-contained and requires no additional file lookups at
load time.
When conditions do not hold, the compiler records the import as dynamic and emits an IMPORT node containing the string table index. At runtime, the VM evaluates the import expression to obtain the actual file path, then uses Nix's standard evaluator to load that file.
What Works And What Does Not
The implementation covers a substantial subset of Nix's expression language. Literals work across all types including integers, floats, strings, paths, URIs, booleans, and null. Lambda expressions, function application, and currying are implemented. Attribute sets with both static and dynamic keys are supported. The let and letrec forms work with proper recursive binding semantics. The if expression, assert statement, with expression, and list literals are all functional.
The implementation does not cover derivations, builtins other than those required for basic operation, or the full module system. These require integration with Nix's store and download mechanisms that the VM does not replicate.
Building And Using
Create a build directory and configure with CMake:
cmake -B build -G Ninja
cmake --build build
This produces nix-irc in the build directory and nix-ir-plugin.so in the
project root.
Compile a Nix file to IR:
./build/nix-irc input.nix output.nixir
Load and evaluate the compiled bundle through Nix:
nix --plugin-files ./nix-ir-plugin.so eval --expr 'builtins.nixIR_loadIR "output.nixir"'
Compile and evaluate source at runtime:
nix --plugin-files ./nix-ir-plugin.so eval --expr 'builtins.nixIR_compile "1 + 2"'