docs: document workflow and configuration fields

Signed-off-by: NotAShelf <raf@notashelf.dev>
Change-Id: I6a1b6adf568d7beae748cdee4ac851f16a6a6964
This commit is contained in:
raf 2026-03-17 16:16:26 +03:00
commit 03215df954
Signed by: NotAShelf
GPG key ID: 29D95B64378DB4BF

147
README.md
View file

@ -3,7 +3,17 @@
Declarative, pipeline-based backup orchestrator for Forgejo with a focus on Declarative, pipeline-based backup orchestrator for Forgejo with a focus on
backpressure tolerance, cryptographic verification, and fan-out concurrency. backpressure tolerance, cryptographic verification, and fan-out concurrency.
## Name Origin ## About
### Why?
Currently my work outside of Github is scattered on various Forgejo instances. I
do not wish to consolidate those into one, as I use those various instances with
different goals and intents but I _do_ want a safeguard that encapsulates all.
Thus, I've come up with a decision to create a proper solution that scratches my
itch.
### Name Origin
The name is derived from similar Esperanto morphology the same way the original The name is derived from similar Esperanto morphology the same way the original
name does: name does:
@ -13,22 +23,131 @@ name does:
morphing into "preservation place" or "archive." morphing into "preservation place" or "archive."
## Why? ## Usage
Currently my work outside of Github is scattered on various Forgejo instances. I ### Prerequisites
do not wish to consolidate those into one, as I use those various instances with
different goals and intents but I _do_ want a safeguard that encapsulates all.
Thus, I've come up with a decision to create a proper solution that scratches my
itch. Here's how it is meant to look like:
<!--markdownlint-disable MD013 --> - A Forgejo personal access token with read access to repositories
- Access to at least one Forgejo instance with API enabled
```plaintext ### Configuration
[Forgejo A] ──┐
[Forgejo B] ──┼──> [Source Adapters] --> [Artifact Stream] --> [Dispatcher] --> [Sink A] --> [Verifier] Konservejo works based on configuration files, there is no CLI configuration to
[Forgejo C] ──┘ │ modify the inputs or outputs. You can configure the service, your sources and
├──> [Sink B] --> [Verifier] your sinks from said configuration file.
└──> [Sink C] --> [Verifier]
```toml
[service]
name = "my-backup"
state_db_path = "/var/lib/konservejo/state.db"
temp_dir = "/var/tmp/konservejo"
# Optional: limit concurrent repository processing
concurrency_limit = 4
# Optional: retry settings
[service.retry]
max_retries = 3
initial_backoff_ms = 500
backoff_multiplier = 2.0
max_backoff_ms = 30000
[[source]]
type = "forgejo"
id = "primary"
api_url = "https://git.example.tld/api/v1"
token = "${FORGEJO_TOKEN}"
[source.scope]
organizations = ["my-org"]
exclude_repos = ["my-org/legacy-repo"]
[[sink]]
type = "filesystem"
id = "local"
path = "/backup/repos"
verify_on_write = true
``` ```
> [!TIP]
> Environment variable interpolation is supported with `${VAR}` syntax. Secrets
> should be passed via environment variables.
### Configuration Reference
**`[service]`**
<!--markdownlint-disable MD013-->
| Field | Type | Default | Description |
| -------------------------- | ------ | -------- | ------------------------------ |
| `name` | string | required | Service identifier |
| `state_db_path` | string | required | SQLite database path |
| `temp_dir` | string | required | Temporary download directory |
| `concurrency_limit` | usize | 4 | Max concurrent repo backups |
| `retry.max_retries` | u32 | 3 | Retry attempts for network ops |
| `retry.initial_backoff_ms` | u64 | 500 | Initial backoff delay |
| `retry.backoff_multiplier` | f64 | 2.0 | Backoff scaling factor |
| `retry.max_backoff_ms` | u64 | 30000 | Maximum backoff delay |
<!--markdownlint-enable MD013--> <!--markdownlint-enable MD013-->
**`[[source]]` (type: "forgejo")**
<!--markdownlint-disable MD013-->
| Field | Type | Description |
| --------------------- | -------- | ------------------------------------------- |
| `id` | string | Source identifier |
| `api_url` | string | Forgejo API base URL |
| `token` | string | Access token (use `${VAR}`) |
| `scope.organizations` | [string] | Orgs to back up |
| `scope.exclude_repos` | [string] | Repos to skip (supports `*/name` wildcards) |
<!--markdownlint-enable MD013-->
**`[[sink]]` (type: "filesystem")**
| Field | Type | Default | Description |
| ----------------- | ------ | -------- | ---------------------------- |
| `id` | string | required | Sink identifier |
| `path` | string | required | Storage directory |
| `verify_on_write` | bool | true | Re-read and hash after write |
### Commands
```bash
# Set credentials
$ export FORGEJO_TOKEN=your_token_here
# Validate configuration
$ konservejo validate-config
# Run backup
$ konservejo backup
# Verify manifest integrity
$ konservejo verify-manifest --run-id <uuid>
```
## Workflow
[Merkle tree]: https://en.wikipedia.org/wiki/Merkle_tree
Repositories are processed concurrently up to `concurrency_limit`. Each
repository is downloaded as a `tar.gz` archive, hashed (Blake3), and written to
all configured sinks. Storage is content-addressed. Artifacts are stored at
`{path}/{hash[0..2]}/{hash[2..4]}/{hash}`.
Then, a [Merkle tree] is computed over all artifacts; the root hash is persisted
to database for integrity verification. You'd do well to protect your database.
There exists a retry logic to handle transient failures, such as network errors.
Permanent failures (4xx) fail immediately while 429, 5xx errors are covered by
retries.
## Current Limitations/TODO
- S3 sink is not implemented (returns explicit error if configured)
- Checkpoint/resume not yet supported
- No retention policy enforcement yet