diff --git a/README.md b/README.md index fb32d96..7dfac73 100644 --- a/README.md +++ b/README.md @@ -3,7 +3,17 @@ Declarative, pipeline-based backup orchestrator for Forgejo with a focus on backpressure tolerance, cryptographic verification, and fan-out concurrency. -## Name Origin +## About + +### Why? + +Currently my work outside of Github is scattered on various Forgejo instances. I +do not wish to consolidate those into one, as I use those various instances with +different goals and intents but I _do_ want a safeguard that encapsulates all. +Thus, I've come up with a decision to create a proper solution that scratches my +itch. + +### Name Origin The name is derived from similar Esperanto morphology the same way the original name does: @@ -13,22 +23,131 @@ name does: morphing into "preservation place" or "archive." -## Why? +## Usage -Currently my work outside of Github is scattered on various Forgejo instances. I -do not wish to consolidate those into one, as I use those various instances with -different goals and intents but I _do_ want a safeguard that encapsulates all. -Thus, I've come up with a decision to create a proper solution that scratches my -itch. Here's how it is meant to look like: +### Prerequisites - +- A Forgejo personal access token with read access to repositories +- Access to at least one Forgejo instance with API enabled -```plaintext -[Forgejo A] ──┐ -[Forgejo B] ──┼──> [Source Adapters] --> [Artifact Stream] --> [Dispatcher] --> [Sink A] --> [Verifier] -[Forgejo C] ──┘ │ - ├──> [Sink B] --> [Verifier] - └──> [Sink C] --> [Verifier] +### Configuration + +Konservejo works based on configuration files, there is no CLI configuration to +modify the inputs or outputs. You can configure the service, your sources and +your sinks from said configuration file. + +```toml +[service] +name = "my-backup" +state_db_path = "/var/lib/konservejo/state.db" +temp_dir = "/var/tmp/konservejo" + +# Optional: limit concurrent repository processing +concurrency_limit = 4 + +# Optional: retry settings +[service.retry] +max_retries = 3 +initial_backoff_ms = 500 +backoff_multiplier = 2.0 +max_backoff_ms = 30000 + +[[source]] +type = "forgejo" +id = "primary" +api_url = "https://git.example.tld/api/v1" +token = "${FORGEJO_TOKEN}" + +[source.scope] +organizations = ["my-org"] +exclude_repos = ["my-org/legacy-repo"] + +[[sink]] +type = "filesystem" +id = "local" +path = "/backup/repos" +verify_on_write = true ``` +> [!TIP] +> Environment variable interpolation is supported with `${VAR}` syntax. Secrets +> should be passed via environment variables. + +### Configuration Reference + +**`[service]`** + + + +| Field | Type | Default | Description | +| -------------------------- | ------ | -------- | ------------------------------ | +| `name` | string | required | Service identifier | +| `state_db_path` | string | required | SQLite database path | +| `temp_dir` | string | required | Temporary download directory | +| `concurrency_limit` | usize | 4 | Max concurrent repo backups | +| `retry.max_retries` | u32 | 3 | Retry attempts for network ops | +| `retry.initial_backoff_ms` | u64 | 500 | Initial backoff delay | +| `retry.backoff_multiplier` | f64 | 2.0 | Backoff scaling factor | +| `retry.max_backoff_ms` | u64 | 30000 | Maximum backoff delay | + + +**`[[source]]` (type: "forgejo")** + + + +| Field | Type | Description | +| --------------------- | -------- | ------------------------------------------- | +| `id` | string | Source identifier | +| `api_url` | string | Forgejo API base URL | +| `token` | string | Access token (use `${VAR}`) | +| `scope.organizations` | [string] | Orgs to back up | +| `scope.exclude_repos` | [string] | Repos to skip (supports `*/name` wildcards) | + + + +**`[[sink]]` (type: "filesystem")** + +| Field | Type | Default | Description | +| ----------------- | ------ | -------- | ---------------------------- | +| `id` | string | required | Sink identifier | +| `path` | string | required | Storage directory | +| `verify_on_write` | bool | true | Re-read and hash after write | + +### Commands + +```bash +# Set credentials +$ export FORGEJO_TOKEN=your_token_here + +# Validate configuration +$ konservejo validate-config + +# Run backup +$ konservejo backup + +# Verify manifest integrity +$ konservejo verify-manifest --run-id +``` + +## Workflow + +[Merkle tree]: https://en.wikipedia.org/wiki/Merkle_tree + +Repositories are processed concurrently up to `concurrency_limit`. Each +repository is downloaded as a `tar.gz` archive, hashed (Blake3), and written to +all configured sinks. Storage is content-addressed. Artifacts are stored at +`{path}/{hash[0..2]}/{hash[2..4]}/{hash}`. + +Then, a [Merkle tree] is computed over all artifacts; the root hash is persisted +to database for integrity verification. You'd do well to protect your database. + +There exists a retry logic to handle transient failures, such as network errors. +Permanent failures (4xx) fail immediately while 429, 5xx errors are covered by +retries. + +## Current Limitations/TODO + +- S3 sink is not implemented (returns explicit error if configured) +- Checkpoint/resume not yet supported +- No retention policy enforcement yet