NotAShelf 03215df954

docs: document workflow and configuration fields

Signed-off-by: NotAShelf <raf@notashelf.dev>
Change-Id: I6a1b6adf568d7beae748cdee4ac851f16a6a6964

2026-03-17 16:26:31 +03:00

4.9 KiB

Raw Blame History

Konservejo

Declarative, pipeline-based backup orchestrator for Forgejo with a focus on backpressure tolerance, cryptographic verification, and fan-out concurrency.

About

Why?

Currently my work outside of Github is scattered on various Forgejo instances. I do not wish to consolidate those into one, as I use those various instances with different goals and intents but I do want a safeguard that encapsulates all. Thus, I've come up with a decision to create a proper solution that scratches my itch.

Name Origin

The name is derived from similar Esperanto morphology the same way the original name does:

konservi = to preserve
-ejo = place

morphing into "preservation place" or "archive."

Usage

Prerequisites

A Forgejo personal access token with read access to repositories
Access to at least one Forgejo instance with API enabled

Configuration

Konservejo works based on configuration files, there is no CLI configuration to modify the inputs or outputs. You can configure the service, your sources and your sinks from said configuration file.

[service]
name = "my-backup"
state_db_path = "/var/lib/konservejo/state.db"
temp_dir = "/var/tmp/konservejo"

# Optional: limit concurrent repository processing
concurrency_limit = 4

# Optional: retry settings
[service.retry]
max_retries = 3
initial_backoff_ms = 500
backoff_multiplier = 2.0
max_backoff_ms = 30000

[[source]]
type = "forgejo"
id = "primary"
api_url = "https://git.example.tld/api/v1"
token = "${FORGEJO_TOKEN}"

[source.scope]
organizations = ["my-org"]
exclude_repos = ["my-org/legacy-repo"]

[[sink]]
type = "filesystem"
id = "local"
path = "/backup/repos"
verify_on_write = true

Tip

Environment variable interpolation is supported with ${VAR} syntax. Secrets should be passed via environment variables.

Configuration Reference

[service]

Field	Type	Default	Description
`name`	string	required	Service identifier
`state_db_path`	string	required	SQLite database path
`temp_dir`	string	required	Temporary download directory
`concurrency_limit`	usize	4	Max concurrent repo backups
`retry.max_retries`	u32	3	Retry attempts for network ops
`retry.initial_backoff_ms`	u64	500	Initial backoff delay
`retry.backoff_multiplier`	f64	2.0	Backoff scaling factor
`retry.max_backoff_ms`	u64	30000	Maximum backoff delay

[[source]] (type: "forgejo")

Field	Type	Description
`id`	string	Source identifier
`api_url`	string	Forgejo API base URL
`token`	string	Access token (use `${VAR}`)
`scope.organizations`	[string]	Orgs to back up
`scope.exclude_repos`	[string]	Repos to skip (supports `*/name` wildcards)

[[sink]] (type: "filesystem")

Field	Type	Default	Description
`id`	string	required	Sink identifier
`path`	string	required	Storage directory
`verify_on_write`	bool	true	Re-read and hash after write

Commands

# Set credentials
$ export FORGEJO_TOKEN=your_token_here

# Validate configuration
$ konservejo validate-config

# Run backup
$ konservejo backup

# Verify manifest integrity
$ konservejo verify-manifest --run-id <uuid>

Workflow

Repositories are processed concurrently up to concurrency_limit. Each repository is downloaded as a tar.gz archive, hashed (Blake3), and written to all configured sinks. Storage is content-addressed. Artifacts are stored at {path}/{hash[0..2]}/{hash[2..4]}/{hash}.

Then, a Merkle tree is computed over all artifacts; the root hash is persisted to database for integrity verification. You'd do well to protect your database.

There exists a retry logic to handle transient failures, such as network errors. Permanent failures (4xx) fail immediately while 429, 5xx errors are covered by retries.

Current Limitations/TODO

S3 sink is not implemented (returns explicit error if configured)
Checkpoint/resume not yet supported
No retention policy enforcement yet

4.9 KiB Raw Blame History