docs: add project README

Signed-off-by: NotAShelf <raf@notashelf.dev> Change-Id: I4aa3f5ba0d5f4967e52ce96514312e776a6a6964
2026-01-30 23:58:54 +03:00 · 2026-01-30 23:58:54 +03:00 · 978cddb862
commit 978cddb862
parent f8db097ba9
1 changed files with 464 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -0,0 +1,464 @@
+# Troutbot
+
+Troutbot is the final solution to protecting the trout population. It's
+environmental protection incarnate!
+
+Well in reality, it's a GitHub webhook bot that analyzes issues and pull
+requests using real signals such as CI check results, diff quality, and body
+structure and then posts trout-themed comments about the findings. Now you know
+whether your changes hurt or help the trout population.
+
+## Quick Start
+
+```bash
+# Install dependencies
+$ npm install
+
+# Populate the environment config
+$ cp .env.example .env
+
+# Set up application confg
+cp config.example.ts config.ts
+
+# Edit .env and config.ts, then to start:
+npm run build && npm start
+```
+
+## How It Works
+
+Troutbot has three analysis backends ran against each incoming webhook event.
+They are the primary decisionmaking logic behind whether your changes affect the
+trout population negatively, or positively.
+
+### `checks`
+
+Queries the GitHub Checks API for the PR's head commit. Looks at check run
+conclusions (ESLint, Clippy, Jest, cargo test, GitHub Actions, etc.) and scores
+based on pass/fail ratio. Any CI failure is a negative signal. Requires a
+`GITHUB_TOKEN`.
+
+### `diff`
+
+Fetches the PR's changed files via the GitHub API. Evaluates:
+
+- **Size**: Small PRs (< 200 lines) are positive; large PRs (above `maxChanges`)
+  are negative
+- **Focus**: Few files changed is positive; 30+ files is negative
+- **Tests**: Presence of test file changes is positive; absence when
+  `requireTests` is set is negative
+- **Net deletion**: Removing more code than you add is positive. Less code is
+  more good.
+
+Requires a `GITHUB_TOKEN`.
+
+### `quality`
+
+Pure text analysis of the issue/PR description. No API calls needed. Checks for:
+
+- **Issues**: Adequate description length, code blocks, reproduction steps,
+  expected/actual behavior sections, environment info
+- **PRs**: Description length, linked issues (`Fixes #123`), test plan sections,
+  code blocks
+- **Both**: Markdown structure/headers, references to other issues, screenshots
+
+Empty or minimal descriptions are flagged as negative.
+
+### Combining Results
+
+Each backend returns an impact (`positive` / `negative` / `neutral`) and a
+confidence score. The engine combines them using configurable weights (default:
+checks 0.4, diff 0.3, quality 0.3). Backends that return zero confidence (e.g.,
+no CI checks found yet) are excluded from the average. If combined confidence
+falls below `confidenceThreshold`, the result is forced to neutral.
+
+## GitHub Account & Token Setup
+
+Troutbot is designed to run as a dedicated bot account on GitHub. Create a
+separate GitHub account for the bot (e.g., `troutbot`) so that comments are
+clearly attributed to it rather than to a personal account.
+
+### 1. Create the bot account
+
+Sign up for a new GitHub account at <https://github.com/signup>. Use a dedicated
+email address for the bot. Give it a recognizable username and avatar.
+
+### 2. Grant repository access
+
+The bot account needs access to every repository it will comment on:
+
+- **For organization repos**: Invite the bot account as a collaborator with
+  **Write** access, or add it to a team with write permissions.
+- **For personal repos**: Add the bot account as a collaborator under
+  \*\*Settings
+  > Collaborators\*\*.
+
+The bot needs write access to post comments. Read access alone is not enough.
+
+### 3. Generate a Personal Access Token
+
+Log in as the bot account and create a fine-grained PAT:
+
+1. Go to **Settings > Developer settings > Personal access tokens > Fine-grained
+   tokens**
+2. Click **Generate new token**
+3. Set a descriptive name (e.g., `troutbot-webhook`)
+4. Set **Expiration** - pick a long-lived duration or no expiration, since this
+   runs unattended
+5. Under **Repository access**, select the specific repositories the bot will
+   operate on (or **All repositories** if it should cover everything the account
+   can see)
+6. Under **Permissions > Repository permissions**, grant:
+   - **Checks**: Read (for the `checks` backend to query CI results)
+   - **Contents**: Read (for the `diff` backend to fetch changed files)
+   - **Issues**: Read and Write (to read issue bodies and post comments)
+   - **Pull requests**: Read and Write (to read PR bodies and post comments)
+   - **Metadata**: Read (required by all fine-grained tokens)
+7. Click **Generate token** and copy the value
+
+Set this as the `GITHUB_TOKEN` environment variable.
+
+> **Classic tokens**: If you prefer a classic PAT instead, create one with the
+> `repo` scope. Fine-grained tokens are recommended because they follow the
+> principle of least privilege.
+
+### 4. Generate a webhook secret
+
+Generate a random secret to verify webhook payloads:
+
+```bash
+openssl rand -hex 32
+```
+
+Set this as the `WEBHOOK_SECRET` environment variable, and use the same value
+when configuring the webhook in GitHub (see
+[GitHub Webhook Setup](#github-webhook-setup)).
+
+## Configuration
+
+### Environment Variables
+
+<!--markdownlint-disable MD013 -->
+
+| Variable         | Description                                           | Required                     |
+| ---------------- | ----------------------------------------------------- | ---------------------------- |
+| `GITHUB_TOKEN`   | Fine-grained PAT from the bot account (see above)     | No (dry-run without it)      |
+| `WEBHOOK_SECRET` | Secret for verifying webhook signatures               | No (skips verification)      |
+| `PORT`           | Server port (overrides `server.port` in config)       | No                           |
+| `CONFIG_PATH`    | Path to config file                                   | No (defaults to `config.ts`) |
+| `LOG_LEVEL`      | Log level override (`debug`, `info`, `warn`, `error`) | No                           |
+
+<!--markdownlint-enable MD013 -->
+
+### Config File
+
+Copy `config.example.ts` to `config.ts`. The config is a TypeScript module that
+default-exports a `Config` object - full type checking and autocompletion in
+your editor.
+
+```typescript
+import type { Config } from "./src/types";
+
+const config: Config = {
+  server: { port: 3000 },
+  engine: {
+    backends: {
+      checks: { enabled: true },
+      diff: { enabled: true, maxChanges: 1000, requireTests: false },
+      quality: { enabled: true, minBodyLength: 50 },
+    },
+    weights: { checks: 0.4, diff: 0.3, quality: 0.3 },
+    confidenceThreshold: 0.1,
+  },
+  // ...
+};
+
+export default config;
+```
+
+The config is loaded at runtime via [jiti](https://github.com/unjs/jiti) - no
+pre-compilation needed.
+
+See `config.example.ts` for the full annotated reference.
+
+## GitHub Webhook Setup
+
+1. Go to your repository's **Settings > Webhooks > Add webhook**
+2. **Payload URL**: `https://your-host/webhook`
+3. **Content type**: `application/json`
+4. **Secret**: Must match your `WEBHOOK_SECRET` env var
+5. **Events**: Select **Issues**, **Pull requests**, and optionally **Check
+   suites** (for re-analysis when CI finishes)
+
+If you enable **Check suites** and set `response.allowUpdates: true` in your
+config, troutbot will update its comment on a PR once CI results are available.
+
+## Production Configuration
+
+When deploying troutbot to production, keep the following in mind:
+
+- **`WEBHOOK_SECRET` is strongly recommended.** Without it, anyone who can reach
+  the `/webhook` endpoint can trigger analysis and post comments. Always set a
+  secret and configure the same value in your GitHub webhook settings.
+- **Use a reverse proxy with TLS.** GitHub sends webhook payloads over HTTPS.
+  Put nginx, Caddy, or a cloud load balancer in front of troutbot and terminate
+  TLS there.
+- **Set `NODE_ENV=production`.** This is set automatically in the Docker image.
+  For standalone deployments, export it in your environment. Express uses this
+  to enable performance optimizations.
+- **Rate limiting** is enabled by default at 120 requests/minute on the
+  `/webhook` endpoint. Override via `server.rateLimit` in your config file.
+- **Request body size** is capped at 1 MB. GitHub webhook payloads are well
+  under this limit.
+- **Graceful shutdown** is built in. The server handles `SIGTERM` and `SIGINT`,
+  stops accepting new connections, and waits up to 10 seconds for in-flight
+  requests to finish before exiting.
+- **Dashboard access control.** The `/dashboard` and `/api/*` endpoints have no
+  built-in authentication. Restrict access via reverse proxy rules, firewall, or
+  binding to localhost. See [Securing the Dashboard](#securing-the-dashboard).
+
+## Deployment
+
+<details>
+<summary>Standalone (Node.js)</summary>
+
+```bash
+npm ci
+npm run build
+export NODE_ENV=production
+export GITHUB_TOKEN="ghp_..."
+export WEBHOOK_SECRET="your-secret"
+npm start
+```
+
+</details>
+
+<details>
+<summary>Docker</summary>
+
+```bash
+docker build -t troutbot .
+docker run -d \
+  --name troutbot \
+  -p 127.0.0.1:3000:3000 \
+  -e GITHUB_TOKEN="ghp_..." \
+  -e WEBHOOK_SECRET="your-secret" \
+  -v $(pwd)/config.ts:/app/config.ts:ro \
+  --restart unless-stopped \
+  troutbot
+```
+
+Multi-stage build, non-root user, built-in health check, `STOPSIGNAL SIGTERM`.
+
+</details>
+
+<details>
+<summary>Docker Compose</summary>
+
+```yaml
+services:
+  troutbot:
+    build: .
+    ports:
+      - "127.0.0.1:3000:3000"
+    env_file: .env
+    volumes:
+      - ./config.ts:/app/config.ts:ro
+    restart: unless-stopped
+    deploy:
+      resources:
+        limits:
+          memory: 256M
+    logging:
+      driver: json-file
+      options:
+        max-size: "10m"
+        max-file: "3"
+```
+
+</details>
+
+<details>
+<summary>systemd</summary>
+
+Create `/etc/systemd/system/troutbot.service`:
+
+```ini
+[Unit]
+Description=Troutbot GitHub Webhook Bot
+After=network.target
+
+[Service]
+Type=simple
+User=troutbot
+WorkingDirectory=/opt/troutbot
+ExecStart=/usr/bin/node dist/index.js
+EnvironmentFile=/opt/troutbot/.env
+Restart=on-failure
+RestartSec=5
+TimeoutStopSec=15
+NoNewPrivileges=true
+ProtectSystem=strict
+ProtectHome=true
+ReadWritePaths=/opt/troutbot
+PrivateTmp=true
+
+[Install]
+WantedBy=multi-user.target
+```
+
+```bash
+sudo systemctl daemon-reload
+sudo systemctl enable --now troutbot
+```
+
+</details>
+
+<details>
+<summary>Reverse Proxy (nginx)</summary>
+
+```nginx
+server {
+    listen 443 ssl;
+    server_name troutbot.example.com;
+
+    ssl_certificate /etc/letsencrypt/live/troutbot.example.com/fullchain.pem;
+    ssl_certificate_key /etc/letsencrypt/live/troutbot.example.com/privkey.pem;
+
+    client_max_body_size 1m;
+    proxy_read_timeout 60s;
+
+    location / {
+        proxy_pass http://127.0.0.1:3000;
+        proxy_set_header Host $host;
+        proxy_set_header X-Real-IP $remote_addr;
+        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
+        proxy_set_header X-Forwarded-Proto $scheme;
+    }
+
+    # Optional: nginx-level rate limiting
+    # limit_req_zone $binary_remote_addr zone=webhook:10m rate=10r/s;
+    # location /webhook {
+    #     limit_req zone=webhook burst=20 nodelay;
+    #     proxy_pass http://127.0.0.1:3000;
+    # }
+}
+```
+
+</details>
+
+## API Endpoints
+
+| Method   | Path          | Description                                                                              |
+| -------- | ------------- | ---------------------------------------------------------------------------------------- |
+| `GET`    | `/health`     | Health check - returns `status`, `uptime` (seconds), `version`, `dryRun`, and `backends` |
+| `POST`   | `/webhook`    | GitHub webhook receiver (rate limited)                                                   |
+| `GET`    | `/dashboard`  | Web UI dashboard with status, events, and config editor                                  |
+| `GET`    | `/api/status` | JSON status: uptime, version, dry-run, backends, repo count                              |
+| `GET`    | `/api/events` | Recent webhook events from the in-memory ring buffer                                     |
+| `DELETE` | `/api/events` | Clear the event ring buffer                                                              |
+| `GET`    | `/api/config` | Current runtime configuration as JSON                                                    |
+| `PUT`    | `/api/config` | Partial config update: deep-merges, validates, and applies in-place                      |
+
+## Dashboard & Runtime API
+
+Troutbot ships with a built-in web dashboard and JSON API for monitoring and
+runtime configuration. No separate frontend build is required.
+
+### Web Dashboard
+
+Navigate to `http://localhost:3000/dashboard` (or wherever your instance is
+running). The dashboard provides:
+
+- **Status card** - uptime, version, dry-run state, active backends, and repo
+  count. Auto-refreshes every 30 seconds.
+- **Event log** - table of recent webhook events showing repo, PR/issue number,
+  action, impact rating, and confidence score. Keeps the last 100 events in
+  memory.
+- **Config editor** - read-only JSON view of the current runtime config with an
+  "Edit" toggle that lets you modify and save changes without restarting.
+
+The dashboard is a single HTML page with inline CSS and vanilla JS - no
+frameworks, no build step, no external assets.
+
+### Runtime Config API
+
+You can inspect and modify the running configuration via the REST API. Changes
+are applied in-place without restarting the server. The update endpoint
+deep-merges your partial config onto the current one and validates before
+applying.
+
+```bash
+# Read current config
+curl http://localhost:3000/api/config
+
+# Update a single setting (partial merge)
+curl -X PUT http://localhost:3000/api/config \
+  -H 'Content-Type: application/json' \
+  -d '{"response": {"allowUpdates": true}}'
+
+# Change engine weights at runtime
+curl -X PUT http://localhost:3000/api/config \
+  -H 'Content-Type: application/json' \
+  -d '{"engine": {"weights": {"checks": 0.5, "diff": 0.25, "quality": 0.25}}}'
+```
+
+Invalid configs are rejected with a 400 status and an error message. The
+original config remains unchanged if validation fails.
+
+### Event Buffer API
+
+The event buffer stores the last 100 processed webhook events in memory. Events
+are lost on restart.
+
+```bash
+# List recent events
+curl http://localhost:3000/api/events
+
+# Clear the buffer
+curl -X DELETE http://localhost:3000/api/events
+```
+
+### Securing the Dashboard
+
+The dashboard and API endpoints have no authentication by default. In
+production, restrict access using one of:
+
+- **Reverse proxy rules** - limit `/dashboard` and `/api/*` to internal IPs or
+  require basic auth at the nginx/Caddy layer
+- **Firewall rules** - only expose port 3000 to trusted networks
+- **Bind to localhost** - set `server.port` and bind to `127.0.0.1` (the Docker
+  examples already do this), then access via SSH tunnel or VPN
+
+Do not expose the dashboard to the public internet without authentication, as
+the config API allows modifying runtime behavior.
+
+## Dry-Run Mode
+
+Without a `GITHUB_TOKEN`, the bot runs in dry-run mode. The quality backend
+still works (text analysis), but checks and diff backends return neutral (they
+need API access). Comments are logged instead of posted.
+
+## Customizing Messages
+
+Edit `response.messages` in your config. Each impact category takes an array of
+strings. One is picked randomly per event.
+
+```typescript
+messages: {
+  positive: [
+    "The trout approve of this {type}!",
+    "Upstream looks clear for this {type}.",
+  ],
+  negative: [
+    "The trout are worried about this {type}.",
+  ],
+  neutral: [
+    "The trout have no opinion on this {type}.",
+  ],
+},
+```
+
+Placeholders:
+
+- `{type}` - `issue` or `pull request`
+- `{impact}` - `positive`, `negative`, or `neutral`