# Nixir Technical Specification This is a distillation of my personal notes on my "research" within the Nix codebase and the subsequent design notes on Nixir. While some of those, naturally, belong in the README I have elected to compile a list of noteworthy details into a "specification document" for those possibly interested, for some reason, in integrating with Nixir. Beware, here be observations. ## What This Project Is Nixir is, most simply (and elegantly) put, a Nix compiler _and runtime_ packaged as a plugin. The compiler component compiles a subset of Nix source to a custom binary intermediate representation (IR) and then executes IR inside a virtual machine running within the plugin process. Hence it's called Nix-ir. As you might've caught on from the README already, the project consists of two artifacts: a standalone compiler tool called `nix-irc` that transforms `.nix` files into `.nixir` bundles, and a plugin library (`nix-ir-plugin.so`) that Nix loads to provide three primops for interacting with compiled IR. The architecture handles the full compilation pipeline. Static imports are resolved at compile time and inlined into the output bundle, while the compiled VM handles all evaluation at runtime. This mirrors how Nixpkgs itself distinguishes between stable library code and application-specific expressions. The plugin does not intercept evaluation automatically. Instead, it exposes primops that users invoke explicitly. This design exists because Nix's plugin API does not provide hooks into the core evaluation loop. Unfortunate, but 'tis life. ## Why Compile Nix Every invocation of `nix eval` or `nix build` must parse, type-check, and evaluate expressions from scratch. For large codebases, this overhead is measurable. Nix does provide a persistent evaluation cache, stored in SQLite. However, this cache only applies to flake-based workflows. Direct imports like `import ./foo.nix` do not benefit from the cache and re-parse on each invocation. For example, a NixOS configuration using direct imports to `nixpkgs.lib` re-parses source files on every rebuild. The compiler front-end accounts for substantial wall-clock time before evaluation begins. Precompiled IR eliminates, or rather, attempts to eliminate this cost. A `.nixir` bundle contains serialized AST nodes with all variable names converted to numeric indices. Loading skips parsing entirely and begins directly with the VM executing pre-processed code. The project _also_ serves as an implementation study. I say also, but it is actually the main goal of this project. Reimplementing Nix's evaluation semantics reveals details that the upstream C++ code obscures. The thunk mechanism, environment model, and cycle detection become tangible when you can read and step through the implementation. I don't expect to get a better understanding of the Nix language, but I now have more reasons to badmouth it. ## The IR Format The binary format uses 36-byte fixed header followed by variable-length sections. All multi-byte integers use little-endian byte order. The header layout: ```plaintext 0x00-0x03: Magic identifier, value 0x4E495258 0x04-0x07: Version number, currently 2 0x08-0x0B: Flags field, reserved 0x0C-0x0F: Offset to string table 0x10-0x13: Offset to primop table 0x14-0x17: Offset to IR blob 0x18-0x1B: String count 0x1C-0x1F: Primop count 0x20-0x23: Reserved ``` The magic value `0x4E495258` corresponds to the bytes N I R X when read in big-endian order. The string table follows the header. Each entry encodes length as a varint, then that many UTF-8 bytes. All attribute names, identifiers, and string literals in the source are de-duplicated at compile time and stored here. References throughout the IR use indices into this table rather than inline strings. The primop table defines built-in operations. Each entry contains the string table index for the operation name, its arity, and optional flags. This table enables the VM to dispatch operations by index without string comparison. The IR blob contains the actual program. Each node begins with a type byte followed by type-specific payload. Node type enumeration from the source: ```plaintext 0x01: CONST_INT - Signed 64-bit integer 0x02: CONST_STRING - String table index 0x03: CONST_PATH - String table index 0x04: CONST_BOOL - 0x00 or 0x01 0x05: CONST_NULL - No payload 0x06: CONST_FLOAT - IEEE 754 double 0x07: CONST_URI - String table index 0x08: CONST_LOOKUP_PATH - String table index for 0x10: VAR - Two varints: depth and index 0x20: LAMBDA - Arity and body offset 0x21: APP - Function and argument offsets 0x22: BINARY_OP - Operation enum and operands 0x23: UNARY_OP - Operation enum and operand 0x24: IMPORT - String table index for file path 0x30: ATTRSET - Count and recursive flag 0x31: SELECT - Expression, attribute, optional default 0x32: WITH - Attribute set and body offsets 0x33: LIST - Count and element offsets 0x34: HAS_ATTR - Expression and attribute 0x40: IF - Condition, then, and else offsets 0x50: LET - Binding count and body offset 0x51: LETREC - Binding count and body offset 0x52: ASSERT - Condition and body offsets 0x60: THUNK - Expression offset 0x61: FORCE - Expression offset 0xFF: ERROR - Error marker ``` Binary operations supported: ```plaintext ADD, SUB, MUL, DIV - Arithmetic on integers CONCAT - List concatenation (++) EQ, NE - Equality comparison LT, GT, LE, GE - Ordering comparison AND, OR, IMPL - Boolean logic MERGE - Attribute set override (//) ``` ## Variable Representation The compiler converts variable names to De Bruijn indices during IR generation. Rather than storing strings like "x" in the output, each variable reference encodes two numbers: the lexical depth and the position within that scope. The depth indicates how many lambda boundaries enclose the reference. A variable in the outermost scope has depth zero. A variable referenced from inside one lambda that refers to the outer scope has depth one. The index indicates the position in that scope's environment array. The first bound variable in a scope has index zero, the second has index one, and so forth. During evaluation, the VM combines these two numbers into a single 32-bit value where the high 16 bits encode depth and the low 16 bits encode index. Lookup traverses the environment chain depth times, then indexes into the resulting scope's binding array. This achieves O(1) variable resolution. ## The Virtual Machine The VM implements lazy evaluation using an explicit thunk mechanism. Every unevaluated expression and function argument wraps in a Thunk structure containing the expression AST node and a pointer to the captured environment. When the VM needs a value, it calls `force()` on the thunk. The force operation checks whether the thunk is already being evaluated. If evaluation attempts to force a thunk that is currently evaluating, the VM detects the cycle and raises "infinite recursion encountered". This matches Nix's behavior for recursive definitions. The environment structure is an array-based chain. Each scope holds a pointer to its parent scope and a vector of bound values. Looking up a variable traverses parent pointers until reaching the scope at the correct depth, then indexes into that scope's value array. This replaces string comparison with pointer traversal and array indexing. Function application follows currying. When applying a function to an argument, the VM checks whether the function's arity is satisfied. If yes, it extends the environment with the new binding and evaluates the body. If not, it returns a partial application awaiting additional arguments. The evaluator handles binary operations with type-specific dispatch. Addition supports integers, strings, and paths with appropriate type coercion rules. Comparison operators work on integers and strings. The merge operator combines two attribute sets with right-side precedence. ## Plugin Primops The plugin registers three primops through Nix's `RegisterPrimOp` interface: `__nixIR_loadIR` accepts a file path string, deserializes the `.nixir` bundle, evaluates the entry expression, and returns the resulting value. The VM measures deserialization time and evaluation time separately, printing timing data to stderr. `__nixIR_compile` accepts a string containing Nix source code, parses it in-memory, generates IR, and evaluates the result. This enables runtime compilation without external tooling. `__nixIR_info` returns an attribute set containing the plugin name "nix-ir-plugin", version "0.1.0", and status "runtime-active". This is a development-only primop that will be removed eventually. The primops use the double-underscore prefix internally. Users access them through `builtins.nixIR_loadIR`, `builtins.nixIR_compile`, and `builtins.nixIR_info` in their expressions. ## Import Handling The compiler performs static import resolution when the import path meets specific conditions. The path must be a literal string literal in the source, not an interpolation or variable. The path must not use home directory expansion. The resolved path must remain within the project root for security. The target file must exist and be readable at compile time. When these conditions hold, the compiler reads the imported file, recursively processes its imports, and embeds the resulting IR into the output bundle. The final `.nixir` file is self-contained and requires no additional file lookups at load time. When conditions do not hold, the compiler records the import as dynamic and emits an IMPORT node containing the string table index. At runtime, the VM evaluates the import expression to obtain the actual file path, then uses Nix's standard evaluator to load that file. ## What Works And What Does Not The implementation covers a substantial subset of Nix's expression language. Literals work across all types including integers, floats, strings, paths, URIs, booleans, and null. Lambda expressions, function application, and currying are implemented. Attribute sets with both static and dynamic keys are supported. The let and letrec forms work with proper recursive binding semantics. The if expression, assert statement, with expression, and list literals are all functional. The implementation does not cover derivations, builtins other than those required for basic operation, or the full module system. These require integration with Nix's store and download mechanisms that the VM does not replicate. ## Building And Using Create a build directory and configure with CMake: ``` cmake -B build -G Ninja cmake --build build ``` This produces `nix-irc` in the build directory and `nix-ir-plugin.so` in the project root. Compile a Nix file to IR: ``` ./build/nix-irc input.nix output.nixir ``` Load and evaluate the compiled bundle through Nix: ``` nix --plugin-files ./nix-ir-plugin.so eval --expr 'builtins.nixIR_loadIR "output.nixir"' ``` Compile and evaluate source at runtime: ``` nix --plugin-files ./nix-ir-plugin.so eval --expr 'builtins.nixIR_compile "1 + 2"' ```