docs: initial specification; we yap
Signed-off-by: NotAShelf <raf@notashelf.dev> Change-Id: I885e6317d186ccdc847195957dba4ab26a6a6964
This commit is contained in:
parent
14bbc09280
commit
56f15d749e
1 changed files with 265 additions and 0 deletions
265
docs/SPEC.md
Normal file
265
docs/SPEC.md
Normal file
|
|
@ -0,0 +1,265 @@
|
|||
# Nixir Technical Specification
|
||||
|
||||
This is a distillation of my personal notes on my "research" within the Nix
|
||||
codebase and the subsequent design notes on Nixir. While some of those,
|
||||
naturally, belong in the README I have elected to compile a list of noteworthy
|
||||
details into a "specification document" for those possibly interested, for some
|
||||
reason, in integrating with Nixir.
|
||||
|
||||
Beware, here be observations.
|
||||
|
||||
## What This Project Is
|
||||
|
||||
Nixir is, most simply (and elegantly) put, a Nix compiler _and runtime_ packaged
|
||||
as a plugin. The compiler component compiles a subset of Nix source to a custom
|
||||
binary intermediate representation (IR) and then executes IR inside a virtual
|
||||
machine running within the plugin process. Hence it's called Nix-ir.
|
||||
|
||||
As you might've caught on from the README already, the project consists of two
|
||||
artifacts: a standalone compiler tool called `nix-irc` that transforms `.nix`
|
||||
files into `.nixir` bundles, and a plugin library (`nix-ir-plugin.so`) that Nix
|
||||
loads to provide three primops for interacting with compiled IR.
|
||||
|
||||
The architecture handles the full compilation pipeline. Static imports are
|
||||
resolved at compile time and inlined into the output bundle, while the compiled
|
||||
VM handles all evaluation at runtime. This mirrors how Nixpkgs itself
|
||||
distinguishes between stable library code and application-specific expressions.
|
||||
|
||||
The plugin does not intercept evaluation automatically. Instead, it exposes
|
||||
primops that users invoke explicitly. This design exists because Nix's plugin
|
||||
API does not provide hooks into the core evaluation loop. Unfortunate, but 'tis
|
||||
life.
|
||||
|
||||
## Why Compile Nix
|
||||
|
||||
Every invocation of `nix eval` or `nix build` must parse, type-check, and
|
||||
evaluate expressions from scratch. For large codebases, this overhead is
|
||||
measurable.
|
||||
|
||||
Nix does provide a persistent evaluation cache, stored in SQLite. However, this
|
||||
cache only applies to flake-based workflows. Direct imports like
|
||||
`import ./foo.nix` do not benefit from the cache and re-parse on each
|
||||
invocation.
|
||||
|
||||
For example, a NixOS configuration using direct imports to `nixpkgs.lib`
|
||||
re-parses source files on every rebuild. The compiler front-end accounts for
|
||||
substantial wall-clock time before evaluation begins.
|
||||
|
||||
Precompiled IR eliminates, or rather, attempts to eliminate this cost. A
|
||||
`.nixir` bundle contains serialized AST nodes with all variable names converted
|
||||
to numeric indices. Loading skips parsing entirely and begins directly with the
|
||||
VM executing pre-processed code.
|
||||
|
||||
The project _also_ serves as an implementation study. I say also, but it is
|
||||
actually the main goal of this project. Reimplementing Nix's evaluation
|
||||
semantics reveals details that the upstream C++ code obscures. The thunk
|
||||
mechanism, environment model, and cycle detection become tangible when you can
|
||||
read and step through the implementation. I don't expect to get a better
|
||||
understanding of the Nix language, but I now have more reasons to badmouth it.
|
||||
|
||||
## The IR Format
|
||||
|
||||
The binary format uses 36-byte fixed header followed by variable-length
|
||||
sections. All multi-byte integers use little-endian byte order.
|
||||
|
||||
The header layout:
|
||||
|
||||
```plaintext
|
||||
0x00-0x03: Magic identifier, value 0x4E495258
|
||||
0x04-0x07: Version number, currently 2
|
||||
0x08-0x0B: Flags field, reserved
|
||||
0x0C-0x0F: Offset to string table
|
||||
0x10-0x13: Offset to primop table
|
||||
0x14-0x17: Offset to IR blob
|
||||
0x18-0x1B: String count
|
||||
0x1C-0x1F: Primop count
|
||||
0x20-0x23: Reserved
|
||||
```
|
||||
|
||||
The magic value `0x4E495258` corresponds to the bytes N I R X when read in
|
||||
big-endian order.
|
||||
|
||||
The string table follows the header. Each entry encodes length as a varint, then
|
||||
that many UTF-8 bytes. All attribute names, identifiers, and string literals in
|
||||
the source are de-duplicated at compile time and stored here. References
|
||||
throughout the IR use indices into this table rather than inline strings.
|
||||
|
||||
The primop table defines built-in operations. Each entry contains the string
|
||||
table index for the operation name, its arity, and optional flags. This table
|
||||
enables the VM to dispatch operations by index without string comparison.
|
||||
|
||||
The IR blob contains the actual program. Each node begins with a type byte
|
||||
followed by type-specific payload.
|
||||
|
||||
Node type enumeration from the source:
|
||||
|
||||
```plaintext
|
||||
0x01: CONST_INT - Signed 64-bit integer
|
||||
0x02: CONST_STRING - String table index
|
||||
0x03: CONST_PATH - String table index
|
||||
0x04: CONST_BOOL - 0x00 or 0x01
|
||||
0x05: CONST_NULL - No payload
|
||||
0x06: CONST_FLOAT - IEEE 754 double
|
||||
0x07: CONST_URI - String table index
|
||||
0x08: CONST_LOOKUP_PATH - String table index for <nixpkgs>
|
||||
0x10: VAR - Two varints: depth and index
|
||||
0x20: LAMBDA - Arity and body offset
|
||||
0x21: APP - Function and argument offsets
|
||||
0x22: BINARY_OP - Operation enum and operands
|
||||
0x23: UNARY_OP - Operation enum and operand
|
||||
0x24: IMPORT - String table index for file path
|
||||
0x30: ATTRSET - Count and recursive flag
|
||||
0x31: SELECT - Expression, attribute, optional default
|
||||
0x32: WITH - Attribute set and body offsets
|
||||
0x33: LIST - Count and element offsets
|
||||
0x34: HAS_ATTR - Expression and attribute
|
||||
0x40: IF - Condition, then, and else offsets
|
||||
0x50: LET - Binding count and body offset
|
||||
0x51: LETREC - Binding count and body offset
|
||||
0x52: ASSERT - Condition and body offsets
|
||||
0x60: THUNK - Expression offset
|
||||
0x61: FORCE - Expression offset
|
||||
0xFF: ERROR - Error marker
|
||||
```
|
||||
|
||||
Binary operations supported:
|
||||
|
||||
```plaintext
|
||||
ADD, SUB, MUL, DIV - Arithmetic on integers
|
||||
CONCAT - List concatenation (++)
|
||||
EQ, NE - Equality comparison
|
||||
LT, GT, LE, GE - Ordering comparison
|
||||
AND, OR, IMPL - Boolean logic
|
||||
MERGE - Attribute set override (//)
|
||||
```
|
||||
|
||||
## Variable Representation
|
||||
|
||||
The compiler converts variable names to De Bruijn indices during IR generation.
|
||||
Rather than storing strings like "x" in the output, each variable reference
|
||||
encodes two numbers: the lexical depth and the position within that scope.
|
||||
|
||||
The depth indicates how many lambda boundaries enclose the reference. A variable
|
||||
in the outermost scope has depth zero. A variable referenced from inside one
|
||||
lambda that refers to the outer scope has depth one.
|
||||
|
||||
The index indicates the position in that scope's environment array. The first
|
||||
bound variable in a scope has index zero, the second has index one, and so
|
||||
forth.
|
||||
|
||||
During evaluation, the VM combines these two numbers into a single 32-bit value
|
||||
where the high 16 bits encode depth and the low 16 bits encode index. Lookup
|
||||
traverses the environment chain depth times, then indexes into the resulting
|
||||
scope's binding array. This achieves O(1) variable resolution.
|
||||
|
||||
## The Virtual Machine
|
||||
|
||||
The VM implements lazy evaluation using an explicit thunk mechanism. Every
|
||||
unevaluated expression and function argument wraps in a Thunk structure
|
||||
containing the expression AST node and a pointer to the captured environment.
|
||||
|
||||
When the VM needs a value, it calls `force()` on the thunk. The force operation
|
||||
checks whether the thunk is already being evaluated. If evaluation attempts to
|
||||
force a thunk that is currently evaluating, the VM detects the cycle and raises
|
||||
"infinite recursion encountered". This matches Nix's behavior for recursive
|
||||
definitions.
|
||||
|
||||
The environment structure is an array-based chain. Each scope holds a pointer to
|
||||
its parent scope and a vector of bound values. Looking up a variable traverses
|
||||
parent pointers until reaching the scope at the correct depth, then indexes into
|
||||
that scope's value array. This replaces string comparison with pointer traversal
|
||||
and array indexing.
|
||||
|
||||
Function application follows currying. When applying a function to an argument,
|
||||
the VM checks whether the function's arity is satisfied. If yes, it extends the
|
||||
environment with the new binding and evaluates the body. If not, it returns a
|
||||
partial application awaiting additional arguments.
|
||||
|
||||
The evaluator handles binary operations with type-specific dispatch. Addition
|
||||
supports integers, strings, and paths with appropriate type coercion rules.
|
||||
Comparison operators work on integers and strings. The merge operator combines
|
||||
two attribute sets with right-side precedence.
|
||||
|
||||
## Plugin Primops
|
||||
|
||||
The plugin registers three primops through Nix's `RegisterPrimOp` interface:
|
||||
|
||||
`__nixIR_loadIR` accepts a file path string, deserializes the `.nixir` bundle,
|
||||
evaluates the entry expression, and returns the resulting value. The VM measures
|
||||
deserialization time and evaluation time separately, printing timing data to
|
||||
stderr.
|
||||
|
||||
`__nixIR_compile` accepts a string containing Nix source code, parses it
|
||||
in-memory, generates IR, and evaluates the result. This enables runtime
|
||||
compilation without external tooling.
|
||||
|
||||
`__nixIR_info` returns an attribute set containing the plugin name
|
||||
"nix-ir-plugin", version "0.1.0", and status "runtime-active". This is a
|
||||
development-only primop that will be removed eventually.
|
||||
|
||||
The primops use the double-underscore prefix internally. Users access them
|
||||
through `builtins.nixIR_loadIR`, `builtins.nixIR_compile`, and
|
||||
`builtins.nixIR_info` in their expressions.
|
||||
|
||||
## Import Handling
|
||||
|
||||
The compiler performs static import resolution when the import path meets
|
||||
specific conditions. The path must be a literal string literal in the source,
|
||||
not an interpolation or variable. The path must not use home directory
|
||||
expansion. The resolved path must remain within the project root for security.
|
||||
The target file must exist and be readable at compile time.
|
||||
|
||||
When these conditions hold, the compiler reads the imported file, recursively
|
||||
processes its imports, and embeds the resulting IR into the output bundle. The
|
||||
final `.nixir` file is self-contained and requires no additional file lookups at
|
||||
load time.
|
||||
|
||||
When conditions do not hold, the compiler records the import as dynamic and
|
||||
emits an IMPORT node containing the string table index. At runtime, the VM
|
||||
evaluates the import expression to obtain the actual file path, then uses Nix's
|
||||
standard evaluator to load that file.
|
||||
|
||||
## What Works And What Does Not
|
||||
|
||||
The implementation covers a substantial subset of Nix's expression language.
|
||||
Literals work across all types including integers, floats, strings, paths, URIs,
|
||||
booleans, and null. Lambda expressions, function application, and currying are
|
||||
implemented. Attribute sets with both static and dynamic keys are supported. The
|
||||
let and letrec forms work with proper recursive binding semantics. The if
|
||||
expression, assert statement, with expression, and list literals are all
|
||||
functional.
|
||||
|
||||
The implementation does not cover derivations, builtins other than those
|
||||
required for basic operation, or the full module system. These require
|
||||
integration with Nix's store and download mechanisms that the VM does not
|
||||
replicate.
|
||||
|
||||
## Building And Using
|
||||
|
||||
Create a build directory and configure with CMake:
|
||||
|
||||
```
|
||||
cmake -B build -G Ninja
|
||||
cmake --build build
|
||||
```
|
||||
|
||||
This produces `nix-irc` in the build directory and `nix-ir-plugin.so` in the
|
||||
project root.
|
||||
|
||||
Compile a Nix file to IR:
|
||||
|
||||
```
|
||||
./build/nix-irc input.nix output.nixir
|
||||
```
|
||||
|
||||
Load and evaluate the compiled bundle through Nix:
|
||||
|
||||
```
|
||||
nix --plugin-files ./nix-ir-plugin.so eval --expr 'builtins.nixIR_loadIR "output.nixir"'
|
||||
```
|
||||
|
||||
Compile and evaluate source at runtime:
|
||||
|
||||
```
|
||||
nix --plugin-files ./nix-ir-plugin.so eval --expr 'builtins.nixIR_compile "1 + 2"'
|
||||
```
|
||||
Loading…
Add table
Add a link
Reference in a new issue