docs: document workflow and configuration fields
Signed-off-by: NotAShelf <raf@notashelf.dev> Change-Id: I6a1b6adf568d7beae748cdee4ac851f16a6a6964
This commit is contained in:
parent
f62d75140f
commit
03215df954
1 changed files with 133 additions and 14 deletions
147
README.md
147
README.md
|
|
@ -3,7 +3,17 @@
|
||||||
Declarative, pipeline-based backup orchestrator for Forgejo with a focus on
|
Declarative, pipeline-based backup orchestrator for Forgejo with a focus on
|
||||||
backpressure tolerance, cryptographic verification, and fan-out concurrency.
|
backpressure tolerance, cryptographic verification, and fan-out concurrency.
|
||||||
|
|
||||||
## Name Origin
|
## About
|
||||||
|
|
||||||
|
### Why?
|
||||||
|
|
||||||
|
Currently my work outside of Github is scattered on various Forgejo instances. I
|
||||||
|
do not wish to consolidate those into one, as I use those various instances with
|
||||||
|
different goals and intents but I _do_ want a safeguard that encapsulates all.
|
||||||
|
Thus, I've come up with a decision to create a proper solution that scratches my
|
||||||
|
itch.
|
||||||
|
|
||||||
|
### Name Origin
|
||||||
|
|
||||||
The name is derived from similar Esperanto morphology the same way the original
|
The name is derived from similar Esperanto morphology the same way the original
|
||||||
name does:
|
name does:
|
||||||
|
|
@ -13,22 +23,131 @@ name does:
|
||||||
|
|
||||||
morphing into "preservation place" or "archive."
|
morphing into "preservation place" or "archive."
|
||||||
|
|
||||||
## Why?
|
## Usage
|
||||||
|
|
||||||
Currently my work outside of Github is scattered on various Forgejo instances. I
|
### Prerequisites
|
||||||
do not wish to consolidate those into one, as I use those various instances with
|
|
||||||
different goals and intents but I _do_ want a safeguard that encapsulates all.
|
|
||||||
Thus, I've come up with a decision to create a proper solution that scratches my
|
|
||||||
itch. Here's how it is meant to look like:
|
|
||||||
|
|
||||||
<!--markdownlint-disable MD013 -->
|
- A Forgejo personal access token with read access to repositories
|
||||||
|
- Access to at least one Forgejo instance with API enabled
|
||||||
|
|
||||||
```plaintext
|
### Configuration
|
||||||
[Forgejo A] ──┐
|
|
||||||
[Forgejo B] ──┼──> [Source Adapters] --> [Artifact Stream] --> [Dispatcher] --> [Sink A] --> [Verifier]
|
Konservejo works based on configuration files, there is no CLI configuration to
|
||||||
[Forgejo C] ──┘ │
|
modify the inputs or outputs. You can configure the service, your sources and
|
||||||
├──> [Sink B] --> [Verifier]
|
your sinks from said configuration file.
|
||||||
└──> [Sink C] --> [Verifier]
|
|
||||||
|
```toml
|
||||||
|
[service]
|
||||||
|
name = "my-backup"
|
||||||
|
state_db_path = "/var/lib/konservejo/state.db"
|
||||||
|
temp_dir = "/var/tmp/konservejo"
|
||||||
|
|
||||||
|
# Optional: limit concurrent repository processing
|
||||||
|
concurrency_limit = 4
|
||||||
|
|
||||||
|
# Optional: retry settings
|
||||||
|
[service.retry]
|
||||||
|
max_retries = 3
|
||||||
|
initial_backoff_ms = 500
|
||||||
|
backoff_multiplier = 2.0
|
||||||
|
max_backoff_ms = 30000
|
||||||
|
|
||||||
|
[[source]]
|
||||||
|
type = "forgejo"
|
||||||
|
id = "primary"
|
||||||
|
api_url = "https://git.example.tld/api/v1"
|
||||||
|
token = "${FORGEJO_TOKEN}"
|
||||||
|
|
||||||
|
[source.scope]
|
||||||
|
organizations = ["my-org"]
|
||||||
|
exclude_repos = ["my-org/legacy-repo"]
|
||||||
|
|
||||||
|
[[sink]]
|
||||||
|
type = "filesystem"
|
||||||
|
id = "local"
|
||||||
|
path = "/backup/repos"
|
||||||
|
verify_on_write = true
|
||||||
```
|
```
|
||||||
|
|
||||||
|
> [!TIP]
|
||||||
|
> Environment variable interpolation is supported with `${VAR}` syntax. Secrets
|
||||||
|
> should be passed via environment variables.
|
||||||
|
|
||||||
|
### Configuration Reference
|
||||||
|
|
||||||
|
**`[service]`**
|
||||||
|
|
||||||
|
<!--markdownlint-disable MD013-->
|
||||||
|
|
||||||
|
| Field | Type | Default | Description |
|
||||||
|
| -------------------------- | ------ | -------- | ------------------------------ |
|
||||||
|
| `name` | string | required | Service identifier |
|
||||||
|
| `state_db_path` | string | required | SQLite database path |
|
||||||
|
| `temp_dir` | string | required | Temporary download directory |
|
||||||
|
| `concurrency_limit` | usize | 4 | Max concurrent repo backups |
|
||||||
|
| `retry.max_retries` | u32 | 3 | Retry attempts for network ops |
|
||||||
|
| `retry.initial_backoff_ms` | u64 | 500 | Initial backoff delay |
|
||||||
|
| `retry.backoff_multiplier` | f64 | 2.0 | Backoff scaling factor |
|
||||||
|
| `retry.max_backoff_ms` | u64 | 30000 | Maximum backoff delay |
|
||||||
|
|
||||||
<!--markdownlint-enable MD013-->
|
<!--markdownlint-enable MD013-->
|
||||||
|
|
||||||
|
**`[[source]]` (type: "forgejo")**
|
||||||
|
|
||||||
|
<!--markdownlint-disable MD013-->
|
||||||
|
|
||||||
|
| Field | Type | Description |
|
||||||
|
| --------------------- | -------- | ------------------------------------------- |
|
||||||
|
| `id` | string | Source identifier |
|
||||||
|
| `api_url` | string | Forgejo API base URL |
|
||||||
|
| `token` | string | Access token (use `${VAR}`) |
|
||||||
|
| `scope.organizations` | [string] | Orgs to back up |
|
||||||
|
| `scope.exclude_repos` | [string] | Repos to skip (supports `*/name` wildcards) |
|
||||||
|
|
||||||
|
<!--markdownlint-enable MD013-->
|
||||||
|
|
||||||
|
**`[[sink]]` (type: "filesystem")**
|
||||||
|
|
||||||
|
| Field | Type | Default | Description |
|
||||||
|
| ----------------- | ------ | -------- | ---------------------------- |
|
||||||
|
| `id` | string | required | Sink identifier |
|
||||||
|
| `path` | string | required | Storage directory |
|
||||||
|
| `verify_on_write` | bool | true | Re-read and hash after write |
|
||||||
|
|
||||||
|
### Commands
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Set credentials
|
||||||
|
$ export FORGEJO_TOKEN=your_token_here
|
||||||
|
|
||||||
|
# Validate configuration
|
||||||
|
$ konservejo validate-config
|
||||||
|
|
||||||
|
# Run backup
|
||||||
|
$ konservejo backup
|
||||||
|
|
||||||
|
# Verify manifest integrity
|
||||||
|
$ konservejo verify-manifest --run-id <uuid>
|
||||||
|
```
|
||||||
|
|
||||||
|
## Workflow
|
||||||
|
|
||||||
|
[Merkle tree]: https://en.wikipedia.org/wiki/Merkle_tree
|
||||||
|
|
||||||
|
Repositories are processed concurrently up to `concurrency_limit`. Each
|
||||||
|
repository is downloaded as a `tar.gz` archive, hashed (Blake3), and written to
|
||||||
|
all configured sinks. Storage is content-addressed. Artifacts are stored at
|
||||||
|
`{path}/{hash[0..2]}/{hash[2..4]}/{hash}`.
|
||||||
|
|
||||||
|
Then, a [Merkle tree] is computed over all artifacts; the root hash is persisted
|
||||||
|
to database for integrity verification. You'd do well to protect your database.
|
||||||
|
|
||||||
|
There exists a retry logic to handle transient failures, such as network errors.
|
||||||
|
Permanent failures (4xx) fail immediately while 429, 5xx errors are covered by
|
||||||
|
retries.
|
||||||
|
|
||||||
|
## Current Limitations/TODO
|
||||||
|
|
||||||
|
- S3 sink is not implemented (returns explicit error if configured)
|
||||||
|
- Checkpoint/resume not yet supported
|
||||||
|
- No retention policy enforcement yet
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue