The monolithic lib.rs had grown to 2600+ lines of `#[cfg(target_arch)]`
blocks interleaved inside every `sys_*` function. Reading or modifying one
arch's syscalls meant scrolling past thirteen others. Pull each arch's asm
into `<arch>.rs`, re-exported via a single `#[cfg] #[path] mod arch;` so
`lib.rs` only contains wrappers, structs, and module dispatch. Adding a new
architecture now touches one new file plus one three-line mod decl.
riscv64 codegen emits calls to bcmp for byte comparisons, which is
undefined when linking with -static and no libc. Provide both bcmp
and memcmp implementations alongside the existing memcpy/memset/strlen.