diff --git a/README.md b/README.md new file mode 100644 index 0000000..200f9d1 --- /dev/null +++ b/README.md @@ -0,0 +1,464 @@ +# Troutbot + +Troutbot is the final solution to protecting the trout population. It's +environmental protection incarnate! + +Well in reality, it's a GitHub webhook bot that analyzes issues and pull +requests using real signals such as CI check results, diff quality, and body +structure and then posts trout-themed comments about the findings. Now you know +whether your changes hurt or help the trout population. + +## Quick Start + +```bash +# Install dependencies +$ npm install + +# Populate the environment config +$ cp .env.example .env + +# Set up application confg +cp config.example.ts config.ts + +# Edit .env and config.ts, then to start: +npm run build && npm start +``` + +## How It Works + +Troutbot has three analysis backends ran against each incoming webhook event. +They are the primary decisionmaking logic behind whether your changes affect the +trout population negatively, or positively. + +### `checks` + +Queries the GitHub Checks API for the PR's head commit. Looks at check run +conclusions (ESLint, Clippy, Jest, cargo test, GitHub Actions, etc.) and scores +based on pass/fail ratio. Any CI failure is a negative signal. Requires a +`GITHUB_TOKEN`. + +### `diff` + +Fetches the PR's changed files via the GitHub API. Evaluates: + +- **Size**: Small PRs (< 200 lines) are positive; large PRs (above `maxChanges`) + are negative +- **Focus**: Few files changed is positive; 30+ files is negative +- **Tests**: Presence of test file changes is positive; absence when + `requireTests` is set is negative +- **Net deletion**: Removing more code than you add is positive. Less code is + more good. + +Requires a `GITHUB_TOKEN`. + +### `quality` + +Pure text analysis of the issue/PR description. No API calls needed. Checks for: + +- **Issues**: Adequate description length, code blocks, reproduction steps, + expected/actual behavior sections, environment info +- **PRs**: Description length, linked issues (`Fixes #123`), test plan sections, + code blocks +- **Both**: Markdown structure/headers, references to other issues, screenshots + +Empty or minimal descriptions are flagged as negative. + +### Combining Results + +Each backend returns an impact (`positive` / `negative` / `neutral`) and a +confidence score. The engine combines them using configurable weights (default: +checks 0.4, diff 0.3, quality 0.3). Backends that return zero confidence (e.g., +no CI checks found yet) are excluded from the average. If combined confidence +falls below `confidenceThreshold`, the result is forced to neutral. + +## GitHub Account & Token Setup + +Troutbot is designed to run as a dedicated bot account on GitHub. Create a +separate GitHub account for the bot (e.g., `troutbot`) so that comments are +clearly attributed to it rather than to a personal account. + +### 1. Create the bot account + +Sign up for a new GitHub account at . Use a dedicated +email address for the bot. Give it a recognizable username and avatar. + +### 2. Grant repository access + +The bot account needs access to every repository it will comment on: + +- **For organization repos**: Invite the bot account as a collaborator with + **Write** access, or add it to a team with write permissions. +- **For personal repos**: Add the bot account as a collaborator under + \*\*Settings + > Collaborators\*\*. + +The bot needs write access to post comments. Read access alone is not enough. + +### 3. Generate a Personal Access Token + +Log in as the bot account and create a fine-grained PAT: + +1. Go to **Settings > Developer settings > Personal access tokens > Fine-grained + tokens** +2. Click **Generate new token** +3. Set a descriptive name (e.g., `troutbot-webhook`) +4. Set **Expiration** - pick a long-lived duration or no expiration, since this + runs unattended +5. Under **Repository access**, select the specific repositories the bot will + operate on (or **All repositories** if it should cover everything the account + can see) +6. Under **Permissions > Repository permissions**, grant: + - **Checks**: Read (for the `checks` backend to query CI results) + - **Contents**: Read (for the `diff` backend to fetch changed files) + - **Issues**: Read and Write (to read issue bodies and post comments) + - **Pull requests**: Read and Write (to read PR bodies and post comments) + - **Metadata**: Read (required by all fine-grained tokens) +7. Click **Generate token** and copy the value + +Set this as the `GITHUB_TOKEN` environment variable. + +> **Classic tokens**: If you prefer a classic PAT instead, create one with the +> `repo` scope. Fine-grained tokens are recommended because they follow the +> principle of least privilege. + +### 4. Generate a webhook secret + +Generate a random secret to verify webhook payloads: + +```bash +openssl rand -hex 32 +``` + +Set this as the `WEBHOOK_SECRET` environment variable, and use the same value +when configuring the webhook in GitHub (see +[GitHub Webhook Setup](#github-webhook-setup)). + +## Configuration + +### Environment Variables + + + +| Variable | Description | Required | +| ---------------- | ----------------------------------------------------- | ---------------------------- | +| `GITHUB_TOKEN` | Fine-grained PAT from the bot account (see above) | No (dry-run without it) | +| `WEBHOOK_SECRET` | Secret for verifying webhook signatures | No (skips verification) | +| `PORT` | Server port (overrides `server.port` in config) | No | +| `CONFIG_PATH` | Path to config file | No (defaults to `config.ts`) | +| `LOG_LEVEL` | Log level override (`debug`, `info`, `warn`, `error`) | No | + + + +### Config File + +Copy `config.example.ts` to `config.ts`. The config is a TypeScript module that +default-exports a `Config` object - full type checking and autocompletion in +your editor. + +```typescript +import type { Config } from "./src/types"; + +const config: Config = { + server: { port: 3000 }, + engine: { + backends: { + checks: { enabled: true }, + diff: { enabled: true, maxChanges: 1000, requireTests: false }, + quality: { enabled: true, minBodyLength: 50 }, + }, + weights: { checks: 0.4, diff: 0.3, quality: 0.3 }, + confidenceThreshold: 0.1, + }, + // ... +}; + +export default config; +``` + +The config is loaded at runtime via [jiti](https://github.com/unjs/jiti) - no +pre-compilation needed. + +See `config.example.ts` for the full annotated reference. + +## GitHub Webhook Setup + +1. Go to your repository's **Settings > Webhooks > Add webhook** +2. **Payload URL**: `https://your-host/webhook` +3. **Content type**: `application/json` +4. **Secret**: Must match your `WEBHOOK_SECRET` env var +5. **Events**: Select **Issues**, **Pull requests**, and optionally **Check + suites** (for re-analysis when CI finishes) + +If you enable **Check suites** and set `response.allowUpdates: true` in your +config, troutbot will update its comment on a PR once CI results are available. + +## Production Configuration + +When deploying troutbot to production, keep the following in mind: + +- **`WEBHOOK_SECRET` is strongly recommended.** Without it, anyone who can reach + the `/webhook` endpoint can trigger analysis and post comments. Always set a + secret and configure the same value in your GitHub webhook settings. +- **Use a reverse proxy with TLS.** GitHub sends webhook payloads over HTTPS. + Put nginx, Caddy, or a cloud load balancer in front of troutbot and terminate + TLS there. +- **Set `NODE_ENV=production`.** This is set automatically in the Docker image. + For standalone deployments, export it in your environment. Express uses this + to enable performance optimizations. +- **Rate limiting** is enabled by default at 120 requests/minute on the + `/webhook` endpoint. Override via `server.rateLimit` in your config file. +- **Request body size** is capped at 1 MB. GitHub webhook payloads are well + under this limit. +- **Graceful shutdown** is built in. The server handles `SIGTERM` and `SIGINT`, + stops accepting new connections, and waits up to 10 seconds for in-flight + requests to finish before exiting. +- **Dashboard access control.** The `/dashboard` and `/api/*` endpoints have no + built-in authentication. Restrict access via reverse proxy rules, firewall, or + binding to localhost. See [Securing the Dashboard](#securing-the-dashboard). + +## Deployment + +
+Standalone (Node.js) + +```bash +npm ci +npm run build +export NODE_ENV=production +export GITHUB_TOKEN="ghp_..." +export WEBHOOK_SECRET="your-secret" +npm start +``` + +
+ +
+Docker + +```bash +docker build -t troutbot . +docker run -d \ + --name troutbot \ + -p 127.0.0.1:3000:3000 \ + -e GITHUB_TOKEN="ghp_..." \ + -e WEBHOOK_SECRET="your-secret" \ + -v $(pwd)/config.ts:/app/config.ts:ro \ + --restart unless-stopped \ + troutbot +``` + +Multi-stage build, non-root user, built-in health check, `STOPSIGNAL SIGTERM`. + +
+ +
+Docker Compose + +```yaml +services: + troutbot: + build: . + ports: + - "127.0.0.1:3000:3000" + env_file: .env + volumes: + - ./config.ts:/app/config.ts:ro + restart: unless-stopped + deploy: + resources: + limits: + memory: 256M + logging: + driver: json-file + options: + max-size: "10m" + max-file: "3" +``` + +
+ +
+systemd + +Create `/etc/systemd/system/troutbot.service`: + +```ini +[Unit] +Description=Troutbot GitHub Webhook Bot +After=network.target + +[Service] +Type=simple +User=troutbot +WorkingDirectory=/opt/troutbot +ExecStart=/usr/bin/node dist/index.js +EnvironmentFile=/opt/troutbot/.env +Restart=on-failure +RestartSec=5 +TimeoutStopSec=15 +NoNewPrivileges=true +ProtectSystem=strict +ProtectHome=true +ReadWritePaths=/opt/troutbot +PrivateTmp=true + +[Install] +WantedBy=multi-user.target +``` + +```bash +sudo systemctl daemon-reload +sudo systemctl enable --now troutbot +``` + +
+ +
+Reverse Proxy (nginx) + +```nginx +server { + listen 443 ssl; + server_name troutbot.example.com; + + ssl_certificate /etc/letsencrypt/live/troutbot.example.com/fullchain.pem; + ssl_certificate_key /etc/letsencrypt/live/troutbot.example.com/privkey.pem; + + client_max_body_size 1m; + proxy_read_timeout 60s; + + location / { + proxy_pass http://127.0.0.1:3000; + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_set_header X-Forwarded-Proto $scheme; + } + + # Optional: nginx-level rate limiting + # limit_req_zone $binary_remote_addr zone=webhook:10m rate=10r/s; + # location /webhook { + # limit_req zone=webhook burst=20 nodelay; + # proxy_pass http://127.0.0.1:3000; + # } +} +``` + +
+ +## API Endpoints + +| Method | Path | Description | +| -------- | ------------- | ---------------------------------------------------------------------------------------- | +| `GET` | `/health` | Health check - returns `status`, `uptime` (seconds), `version`, `dryRun`, and `backends` | +| `POST` | `/webhook` | GitHub webhook receiver (rate limited) | +| `GET` | `/dashboard` | Web UI dashboard with status, events, and config editor | +| `GET` | `/api/status` | JSON status: uptime, version, dry-run, backends, repo count | +| `GET` | `/api/events` | Recent webhook events from the in-memory ring buffer | +| `DELETE` | `/api/events` | Clear the event ring buffer | +| `GET` | `/api/config` | Current runtime configuration as JSON | +| `PUT` | `/api/config` | Partial config update: deep-merges, validates, and applies in-place | + +## Dashboard & Runtime API + +Troutbot ships with a built-in web dashboard and JSON API for monitoring and +runtime configuration. No separate frontend build is required. + +### Web Dashboard + +Navigate to `http://localhost:3000/dashboard` (or wherever your instance is +running). The dashboard provides: + +- **Status card** - uptime, version, dry-run state, active backends, and repo + count. Auto-refreshes every 30 seconds. +- **Event log** - table of recent webhook events showing repo, PR/issue number, + action, impact rating, and confidence score. Keeps the last 100 events in + memory. +- **Config editor** - read-only JSON view of the current runtime config with an + "Edit" toggle that lets you modify and save changes without restarting. + +The dashboard is a single HTML page with inline CSS and vanilla JS - no +frameworks, no build step, no external assets. + +### Runtime Config API + +You can inspect and modify the running configuration via the REST API. Changes +are applied in-place without restarting the server. The update endpoint +deep-merges your partial config onto the current one and validates before +applying. + +```bash +# Read current config +curl http://localhost:3000/api/config + +# Update a single setting (partial merge) +curl -X PUT http://localhost:3000/api/config \ + -H 'Content-Type: application/json' \ + -d '{"response": {"allowUpdates": true}}' + +# Change engine weights at runtime +curl -X PUT http://localhost:3000/api/config \ + -H 'Content-Type: application/json' \ + -d '{"engine": {"weights": {"checks": 0.5, "diff": 0.25, "quality": 0.25}}}' +``` + +Invalid configs are rejected with a 400 status and an error message. The +original config remains unchanged if validation fails. + +### Event Buffer API + +The event buffer stores the last 100 processed webhook events in memory. Events +are lost on restart. + +```bash +# List recent events +curl http://localhost:3000/api/events + +# Clear the buffer +curl -X DELETE http://localhost:3000/api/events +``` + +### Securing the Dashboard + +The dashboard and API endpoints have no authentication by default. In +production, restrict access using one of: + +- **Reverse proxy rules** - limit `/dashboard` and `/api/*` to internal IPs or + require basic auth at the nginx/Caddy layer +- **Firewall rules** - only expose port 3000 to trusted networks +- **Bind to localhost** - set `server.port` and bind to `127.0.0.1` (the Docker + examples already do this), then access via SSH tunnel or VPN + +Do not expose the dashboard to the public internet without authentication, as +the config API allows modifying runtime behavior. + +## Dry-Run Mode + +Without a `GITHUB_TOKEN`, the bot runs in dry-run mode. The quality backend +still works (text analysis), but checks and diff backends return neutral (they +need API access). Comments are logged instead of posted. + +## Customizing Messages + +Edit `response.messages` in your config. Each impact category takes an array of +strings. One is picked randomly per event. + +```typescript +messages: { + positive: [ + "The trout approve of this {type}!", + "Upstream looks clear for this {type}.", + ], + negative: [ + "The trout are worried about this {type}.", + ], + neutral: [ + "The trout have no opinion on this {type}.", + ], +}, +``` + +Placeholders: + +- `{type}` - `issue` or `pull request` +- `{impact}` - `positive`, `negative`, or `neutral`