Multi-input, multi-output backup service for Forgejo and derivatives

Rust 98%
Nix 2%

Find a file

NotAShelf b8a2d1f1b2 konservejo: expose modules for testing Signed-off-by: NotAShelf <raf@notashelf.dev> Change-Id: I8f7ab382c44d65d2d03d91ec99218b6a6a6a6964		2026-03-19 17:01:05 +03:00
nix	nix: add cargo-nextest to default devshell	2026-03-19 17:01:00 +03:00
src	konservejo: expose modules for testing	2026-03-19 17:01:05 +03:00
.envrc	nix: initial tooling	2026-03-17 16:18:11 +03:00
.gitignore	meta: ignore build artifacts	2026-03-17 16:18:16 +03:00
.rustfmt.toml	chore: add formatter configurations	2026-03-17 16:18:14 +03:00
.taplo.toml	chore: add formatter configurations	2026-03-17 16:18:14 +03:00
Cargo.lock	initial working prototype	2026-03-17 16:18:13 +03:00
Cargo.toml	chore: explicitly warn on unlined format args	2026-03-17 17:41:53 +03:00
flake.lock	nix: initial tooling	2026-03-17 16:18:11 +03:00
flake.nix	nix: initial tooling	2026-03-17 16:18:11 +03:00
README.md	docs: document workflow and configuration fields	2026-03-17 16:26:31 +03:00

README.md

Konservejo

Declarative, pipeline-based backup orchestrator for Forgejo with a focus on backpressure tolerance, cryptographic verification, and fan-out concurrency.

About

Why?

Currently my work outside of Github is scattered on various Forgejo instances. I do not wish to consolidate those into one, as I use those various instances with different goals and intents but I do want a safeguard that encapsulates all. Thus, I've come up with a decision to create a proper solution that scratches my itch.

Name Origin

The name is derived from similar Esperanto morphology the same way the original name does:

konservi = to preserve
-ejo = place

morphing into "preservation place" or "archive."

Usage

Prerequisites

A Forgejo personal access token with read access to repositories
Access to at least one Forgejo instance with API enabled

Configuration

Konservejo works based on configuration files, there is no CLI configuration to modify the inputs or outputs. You can configure the service, your sources and your sinks from said configuration file.

[service]
name = "my-backup"
state_db_path = "/var/lib/konservejo/state.db"
temp_dir = "/var/tmp/konservejo"

# Optional: limit concurrent repository processing
concurrency_limit = 4

# Optional: retry settings
[service.retry]
max_retries = 3
initial_backoff_ms = 500
backoff_multiplier = 2.0
max_backoff_ms = 30000

[[source]]
type = "forgejo"
id = "primary"
api_url = "https://git.example.tld/api/v1"
token = "${FORGEJO_TOKEN}"

[source.scope]
organizations = ["my-org"]
exclude_repos = ["my-org/legacy-repo"]

[[sink]]
type = "filesystem"
id = "local"
path = "/backup/repos"
verify_on_write = true

Tip

Environment variable interpolation is supported with ${VAR} syntax. Secrets should be passed via environment variables.

Configuration Reference

[service]

Field	Type	Default	Description
`name`	string	required	Service identifier
`state_db_path`	string	required	SQLite database path
`temp_dir`	string	required	Temporary download directory
`concurrency_limit`	usize	4	Max concurrent repo backups
`retry.max_retries`	u32	3	Retry attempts for network ops
`retry.initial_backoff_ms`	u64	500	Initial backoff delay
`retry.backoff_multiplier`	f64	2.0	Backoff scaling factor
`retry.max_backoff_ms`	u64	30000	Maximum backoff delay

[[source]] (type: "forgejo")

Field	Type	Description
`id`	string	Source identifier
`api_url`	string	Forgejo API base URL
`token`	string	Access token (use `${VAR}`)
`scope.organizations`	[string]	Orgs to back up
`scope.exclude_repos`	[string]	Repos to skip (supports `*/name` wildcards)

[[sink]] (type: "filesystem")

Field	Type	Default	Description
`id`	string	required	Sink identifier
`path`	string	required	Storage directory
`verify_on_write`	bool	true	Re-read and hash after write

Commands

# Set credentials
$ export FORGEJO_TOKEN=your_token_here

# Validate configuration
$ konservejo validate-config

# Run backup
$ konservejo backup

# Verify manifest integrity
$ konservejo verify-manifest --run-id <uuid>

Workflow

Repositories are processed concurrently up to concurrency_limit. Each repository is downloaded as a tar.gz archive, hashed (Blake3), and written to all configured sinks. Storage is content-addressed. Artifacts are stored at {path}/{hash[0..2]}/{hash[2..4]}/{hash}.

Then, a Merkle tree is computed over all artifacts; the root hash is persisted to database for integrity verification. You'd do well to protect your database.

There exists a retry logic to handle transient failures, such as network errors. Permanent failures (4xx) fail immediately while 429, 5xx errors are covered by retries.

Current Limitations/TODO

S3 sink is not implemented (returns explicit error if configured)
Checkpoint/resume not yet supported
No retention policy enforcement yet