Multi-input, multi-output backup service for Forgejo and derivatives
  • Rust 98%
  • Nix 2%
Find a file
NotAShelf b8a2d1f1b2
konservejo: expose modules for testing
Signed-off-by: NotAShelf <raf@notashelf.dev>
Change-Id: I8f7ab382c44d65d2d03d91ec99218b6a6a6a6964
2026-03-19 17:01:05 +03:00
nix nix: add cargo-nextest to default devshell 2026-03-19 17:01:00 +03:00
src konservejo: expose modules for testing 2026-03-19 17:01:05 +03:00
.envrc nix: initial tooling 2026-03-17 16:18:11 +03:00
.gitignore meta: ignore build artifacts 2026-03-17 16:18:16 +03:00
.rustfmt.toml chore: add formatter configurations 2026-03-17 16:18:14 +03:00
.taplo.toml chore: add formatter configurations 2026-03-17 16:18:14 +03:00
Cargo.lock initial working prototype 2026-03-17 16:18:13 +03:00
Cargo.toml chore: explicitly warn on unlined format args 2026-03-17 17:41:53 +03:00
flake.lock nix: initial tooling 2026-03-17 16:18:11 +03:00
flake.nix nix: initial tooling 2026-03-17 16:18:11 +03:00
README.md docs: document workflow and configuration fields 2026-03-17 16:26:31 +03:00

Konservejo

Declarative, pipeline-based backup orchestrator for Forgejo with a focus on backpressure tolerance, cryptographic verification, and fan-out concurrency.

About

Why?

Currently my work outside of Github is scattered on various Forgejo instances. I do not wish to consolidate those into one, as I use those various instances with different goals and intents but I do want a safeguard that encapsulates all. Thus, I've come up with a decision to create a proper solution that scratches my itch.

Name Origin

The name is derived from similar Esperanto morphology the same way the original name does:

  • konservi = to preserve
  • -ejo = place

morphing into "preservation place" or "archive."

Usage

Prerequisites

  • A Forgejo personal access token with read access to repositories
  • Access to at least one Forgejo instance with API enabled

Configuration

Konservejo works based on configuration files, there is no CLI configuration to modify the inputs or outputs. You can configure the service, your sources and your sinks from said configuration file.

[service]
name = "my-backup"
state_db_path = "/var/lib/konservejo/state.db"
temp_dir = "/var/tmp/konservejo"

# Optional: limit concurrent repository processing
concurrency_limit = 4

# Optional: retry settings
[service.retry]
max_retries = 3
initial_backoff_ms = 500
backoff_multiplier = 2.0
max_backoff_ms = 30000

[[source]]
type = "forgejo"
id = "primary"
api_url = "https://git.example.tld/api/v1"
token = "${FORGEJO_TOKEN}"

[source.scope]
organizations = ["my-org"]
exclude_repos = ["my-org/legacy-repo"]

[[sink]]
type = "filesystem"
id = "local"
path = "/backup/repos"
verify_on_write = true

Tip

Environment variable interpolation is supported with ${VAR} syntax. Secrets should be passed via environment variables.

Configuration Reference

[service]

Field Type Default Description
name string required Service identifier
state_db_path string required SQLite database path
temp_dir string required Temporary download directory
concurrency_limit usize 4 Max concurrent repo backups
retry.max_retries u32 3 Retry attempts for network ops
retry.initial_backoff_ms u64 500 Initial backoff delay
retry.backoff_multiplier f64 2.0 Backoff scaling factor
retry.max_backoff_ms u64 30000 Maximum backoff delay

[[source]] (type: "forgejo")

Field Type Description
id string Source identifier
api_url string Forgejo API base URL
token string Access token (use ${VAR})
scope.organizations [string] Orgs to back up
scope.exclude_repos [string] Repos to skip (supports */name wildcards)

[[sink]] (type: "filesystem")

Field Type Default Description
id string required Sink identifier
path string required Storage directory
verify_on_write bool true Re-read and hash after write

Commands

# Set credentials
$ export FORGEJO_TOKEN=your_token_here

# Validate configuration
$ konservejo validate-config

# Run backup
$ konservejo backup

# Verify manifest integrity
$ konservejo verify-manifest --run-id <uuid>

Workflow

Repositories are processed concurrently up to concurrency_limit. Each repository is downloaded as a tar.gz archive, hashed (Blake3), and written to all configured sinks. Storage is content-addressed. Artifacts are stored at {path}/{hash[0..2]}/{hash[2..4]}/{hash}.

Then, a Merkle tree is computed over all artifacts; the root hash is persisted to database for integrity verification. You'd do well to protect your database.

There exists a retry logic to handle transient failures, such as network errors. Permanent failures (4xx) fail immediately while 429, 5xx errors are covered by retries.

Current Limitations/TODO

  • S3 sink is not implemented (returns explicit error if configured)
  • Checkpoint/resume not yet supported
  • No retention policy enforcement yet