- Rust 98%
- Nix 2%
Signed-off-by: NotAShelf <raf@notashelf.dev> Change-Id: I8f7ab382c44d65d2d03d91ec99218b6a6a6a6964 |
||
|---|---|---|
| nix | ||
| src | ||
| .envrc | ||
| .gitignore | ||
| .rustfmt.toml | ||
| .taplo.toml | ||
| Cargo.lock | ||
| Cargo.toml | ||
| flake.lock | ||
| flake.nix | ||
| README.md | ||
Konservejo
Declarative, pipeline-based backup orchestrator for Forgejo with a focus on backpressure tolerance, cryptographic verification, and fan-out concurrency.
About
Why?
Currently my work outside of Github is scattered on various Forgejo instances. I do not wish to consolidate those into one, as I use those various instances with different goals and intents but I do want a safeguard that encapsulates all. Thus, I've come up with a decision to create a proper solution that scratches my itch.
Name Origin
The name is derived from similar Esperanto morphology the same way the original name does:
konservi= to preserve-ejo= place
morphing into "preservation place" or "archive."
Usage
Prerequisites
- A Forgejo personal access token with read access to repositories
- Access to at least one Forgejo instance with API enabled
Configuration
Konservejo works based on configuration files, there is no CLI configuration to modify the inputs or outputs. You can configure the service, your sources and your sinks from said configuration file.
[service]
name = "my-backup"
state_db_path = "/var/lib/konservejo/state.db"
temp_dir = "/var/tmp/konservejo"
# Optional: limit concurrent repository processing
concurrency_limit = 4
# Optional: retry settings
[service.retry]
max_retries = 3
initial_backoff_ms = 500
backoff_multiplier = 2.0
max_backoff_ms = 30000
[[source]]
type = "forgejo"
id = "primary"
api_url = "https://git.example.tld/api/v1"
token = "${FORGEJO_TOKEN}"
[source.scope]
organizations = ["my-org"]
exclude_repos = ["my-org/legacy-repo"]
[[sink]]
type = "filesystem"
id = "local"
path = "/backup/repos"
verify_on_write = true
Tip
Environment variable interpolation is supported with
${VAR}syntax. Secrets should be passed via environment variables.
Configuration Reference
[service]
| Field | Type | Default | Description |
|---|---|---|---|
name |
string | required | Service identifier |
state_db_path |
string | required | SQLite database path |
temp_dir |
string | required | Temporary download directory |
concurrency_limit |
usize | 4 | Max concurrent repo backups |
retry.max_retries |
u32 | 3 | Retry attempts for network ops |
retry.initial_backoff_ms |
u64 | 500 | Initial backoff delay |
retry.backoff_multiplier |
f64 | 2.0 | Backoff scaling factor |
retry.max_backoff_ms |
u64 | 30000 | Maximum backoff delay |
[[source]] (type: "forgejo")
| Field | Type | Description |
|---|---|---|
id |
string | Source identifier |
api_url |
string | Forgejo API base URL |
token |
string | Access token (use ${VAR}) |
scope.organizations |
[string] | Orgs to back up |
scope.exclude_repos |
[string] | Repos to skip (supports */name wildcards) |
[[sink]] (type: "filesystem")
| Field | Type | Default | Description |
|---|---|---|---|
id |
string | required | Sink identifier |
path |
string | required | Storage directory |
verify_on_write |
bool | true | Re-read and hash after write |
Commands
# Set credentials
$ export FORGEJO_TOKEN=your_token_here
# Validate configuration
$ konservejo validate-config
# Run backup
$ konservejo backup
# Verify manifest integrity
$ konservejo verify-manifest --run-id <uuid>
Workflow
Repositories are processed concurrently up to concurrency_limit. Each
repository is downloaded as a tar.gz archive, hashed (Blake3), and written to
all configured sinks. Storage is content-addressed. Artifacts are stored at
{path}/{hash[0..2]}/{hash[2..4]}/{hash}.
Then, a Merkle tree is computed over all artifacts; the root hash is persisted to database for integrity verification. You'd do well to protect your database.
There exists a retry logic to handle transient failures, such as network errors. Permanent failures (4xx) fail immediately while 429, 5xx errors are covered by retries.
Current Limitations/TODO
- S3 sink is not implemented (returns explicit error if configured)
- Checkpoint/resume not yet supported
- No retention policy enforcement yet