troutbot/README.md
NotAShelf 978cddb862
docs: add project README
Signed-off-by: NotAShelf <raf@notashelf.dev>
Change-Id: I4aa3f5ba0d5f4967e52ce96514312e776a6a6964
2026-01-31 02:59:54 +03:00

464 lines
15 KiB
Markdown

# Troutbot
Troutbot is the final solution to protecting the trout population. It's
environmental protection incarnate!
Well in reality, it's a GitHub webhook bot that analyzes issues and pull
requests using real signals such as CI check results, diff quality, and body
structure and then posts trout-themed comments about the findings. Now you know
whether your changes hurt or help the trout population.
## Quick Start
```bash
# Install dependencies
$ npm install
# Populate the environment config
$ cp .env.example .env
# Set up application confg
cp config.example.ts config.ts
# Edit .env and config.ts, then to start:
npm run build && npm start
```
## How It Works
Troutbot has three analysis backends ran against each incoming webhook event.
They are the primary decisionmaking logic behind whether your changes affect the
trout population negatively, or positively.
### `checks`
Queries the GitHub Checks API for the PR's head commit. Looks at check run
conclusions (ESLint, Clippy, Jest, cargo test, GitHub Actions, etc.) and scores
based on pass/fail ratio. Any CI failure is a negative signal. Requires a
`GITHUB_TOKEN`.
### `diff`
Fetches the PR's changed files via the GitHub API. Evaluates:
- **Size**: Small PRs (< 200 lines) are positive; large PRs (above `maxChanges`)
are negative
- **Focus**: Few files changed is positive; 30+ files is negative
- **Tests**: Presence of test file changes is positive; absence when
`requireTests` is set is negative
- **Net deletion**: Removing more code than you add is positive. Less code is
more good.
Requires a `GITHUB_TOKEN`.
### `quality`
Pure text analysis of the issue/PR description. No API calls needed. Checks for:
- **Issues**: Adequate description length, code blocks, reproduction steps,
expected/actual behavior sections, environment info
- **PRs**: Description length, linked issues (`Fixes #123`), test plan sections,
code blocks
- **Both**: Markdown structure/headers, references to other issues, screenshots
Empty or minimal descriptions are flagged as negative.
### Combining Results
Each backend returns an impact (`positive` / `negative` / `neutral`) and a
confidence score. The engine combines them using configurable weights (default:
checks 0.4, diff 0.3, quality 0.3). Backends that return zero confidence (e.g.,
no CI checks found yet) are excluded from the average. If combined confidence
falls below `confidenceThreshold`, the result is forced to neutral.
## GitHub Account & Token Setup
Troutbot is designed to run as a dedicated bot account on GitHub. Create a
separate GitHub account for the bot (e.g., `troutbot`) so that comments are
clearly attributed to it rather than to a personal account.
### 1. Create the bot account
Sign up for a new GitHub account at <https://github.com/signup>. Use a dedicated
email address for the bot. Give it a recognizable username and avatar.
### 2. Grant repository access
The bot account needs access to every repository it will comment on:
- **For organization repos**: Invite the bot account as a collaborator with
**Write** access, or add it to a team with write permissions.
- **For personal repos**: Add the bot account as a collaborator under
\*\*Settings
> Collaborators\*\*.
The bot needs write access to post comments. Read access alone is not enough.
### 3. Generate a Personal Access Token
Log in as the bot account and create a fine-grained PAT:
1. Go to **Settings > Developer settings > Personal access tokens > Fine-grained
tokens**
2. Click **Generate new token**
3. Set a descriptive name (e.g., `troutbot-webhook`)
4. Set **Expiration** - pick a long-lived duration or no expiration, since this
runs unattended
5. Under **Repository access**, select the specific repositories the bot will
operate on (or **All repositories** if it should cover everything the account
can see)
6. Under **Permissions > Repository permissions**, grant:
- **Checks**: Read (for the `checks` backend to query CI results)
- **Contents**: Read (for the `diff` backend to fetch changed files)
- **Issues**: Read and Write (to read issue bodies and post comments)
- **Pull requests**: Read and Write (to read PR bodies and post comments)
- **Metadata**: Read (required by all fine-grained tokens)
7. Click **Generate token** and copy the value
Set this as the `GITHUB_TOKEN` environment variable.
> **Classic tokens**: If you prefer a classic PAT instead, create one with the
> `repo` scope. Fine-grained tokens are recommended because they follow the
> principle of least privilege.
### 4. Generate a webhook secret
Generate a random secret to verify webhook payloads:
```bash
openssl rand -hex 32
```
Set this as the `WEBHOOK_SECRET` environment variable, and use the same value
when configuring the webhook in GitHub (see
[GitHub Webhook Setup](#github-webhook-setup)).
## Configuration
### Environment Variables
<!--markdownlint-disable MD013 -->
| Variable | Description | Required |
| ---------------- | ----------------------------------------------------- | ---------------------------- |
| `GITHUB_TOKEN` | Fine-grained PAT from the bot account (see above) | No (dry-run without it) |
| `WEBHOOK_SECRET` | Secret for verifying webhook signatures | No (skips verification) |
| `PORT` | Server port (overrides `server.port` in config) | No |
| `CONFIG_PATH` | Path to config file | No (defaults to `config.ts`) |
| `LOG_LEVEL` | Log level override (`debug`, `info`, `warn`, `error`) | No |
<!--markdownlint-enable MD013 -->
### Config File
Copy `config.example.ts` to `config.ts`. The config is a TypeScript module that
default-exports a `Config` object - full type checking and autocompletion in
your editor.
```typescript
import type { Config } from "./src/types";
const config: Config = {
server: { port: 3000 },
engine: {
backends: {
checks: { enabled: true },
diff: { enabled: true, maxChanges: 1000, requireTests: false },
quality: { enabled: true, minBodyLength: 50 },
},
weights: { checks: 0.4, diff: 0.3, quality: 0.3 },
confidenceThreshold: 0.1,
},
// ...
};
export default config;
```
The config is loaded at runtime via [jiti](https://github.com/unjs/jiti) - no
pre-compilation needed.
See `config.example.ts` for the full annotated reference.
## GitHub Webhook Setup
1. Go to your repository's **Settings > Webhooks > Add webhook**
2. **Payload URL**: `https://your-host/webhook`
3. **Content type**: `application/json`
4. **Secret**: Must match your `WEBHOOK_SECRET` env var
5. **Events**: Select **Issues**, **Pull requests**, and optionally **Check
suites** (for re-analysis when CI finishes)
If you enable **Check suites** and set `response.allowUpdates: true` in your
config, troutbot will update its comment on a PR once CI results are available.
## Production Configuration
When deploying troutbot to production, keep the following in mind:
- **`WEBHOOK_SECRET` is strongly recommended.** Without it, anyone who can reach
the `/webhook` endpoint can trigger analysis and post comments. Always set a
secret and configure the same value in your GitHub webhook settings.
- **Use a reverse proxy with TLS.** GitHub sends webhook payloads over HTTPS.
Put nginx, Caddy, or a cloud load balancer in front of troutbot and terminate
TLS there.
- **Set `NODE_ENV=production`.** This is set automatically in the Docker image.
For standalone deployments, export it in your environment. Express uses this
to enable performance optimizations.
- **Rate limiting** is enabled by default at 120 requests/minute on the
`/webhook` endpoint. Override via `server.rateLimit` in your config file.
- **Request body size** is capped at 1 MB. GitHub webhook payloads are well
under this limit.
- **Graceful shutdown** is built in. The server handles `SIGTERM` and `SIGINT`,
stops accepting new connections, and waits up to 10 seconds for in-flight
requests to finish before exiting.
- **Dashboard access control.** The `/dashboard` and `/api/*` endpoints have no
built-in authentication. Restrict access via reverse proxy rules, firewall, or
binding to localhost. See [Securing the Dashboard](#securing-the-dashboard).
## Deployment
<details>
<summary>Standalone (Node.js)</summary>
```bash
npm ci
npm run build
export NODE_ENV=production
export GITHUB_TOKEN="ghp_..."
export WEBHOOK_SECRET="your-secret"
npm start
```
</details>
<details>
<summary>Docker</summary>
```bash
docker build -t troutbot .
docker run -d \
--name troutbot \
-p 127.0.0.1:3000:3000 \
-e GITHUB_TOKEN="ghp_..." \
-e WEBHOOK_SECRET="your-secret" \
-v $(pwd)/config.ts:/app/config.ts:ro \
--restart unless-stopped \
troutbot
```
Multi-stage build, non-root user, built-in health check, `STOPSIGNAL SIGTERM`.
</details>
<details>
<summary>Docker Compose</summary>
```yaml
services:
troutbot:
build: .
ports:
- "127.0.0.1:3000:3000"
env_file: .env
volumes:
- ./config.ts:/app/config.ts:ro
restart: unless-stopped
deploy:
resources:
limits:
memory: 256M
logging:
driver: json-file
options:
max-size: "10m"
max-file: "3"
```
</details>
<details>
<summary>systemd</summary>
Create `/etc/systemd/system/troutbot.service`:
```ini
[Unit]
Description=Troutbot GitHub Webhook Bot
After=network.target
[Service]
Type=simple
User=troutbot
WorkingDirectory=/opt/troutbot
ExecStart=/usr/bin/node dist/index.js
EnvironmentFile=/opt/troutbot/.env
Restart=on-failure
RestartSec=5
TimeoutStopSec=15
NoNewPrivileges=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/opt/troutbot
PrivateTmp=true
[Install]
WantedBy=multi-user.target
```
```bash
sudo systemctl daemon-reload
sudo systemctl enable --now troutbot
```
</details>
<details>
<summary>Reverse Proxy (nginx)</summary>
```nginx
server {
listen 443 ssl;
server_name troutbot.example.com;
ssl_certificate /etc/letsencrypt/live/troutbot.example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/troutbot.example.com/privkey.pem;
client_max_body_size 1m;
proxy_read_timeout 60s;
location / {
proxy_pass http://127.0.0.1:3000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
# Optional: nginx-level rate limiting
# limit_req_zone $binary_remote_addr zone=webhook:10m rate=10r/s;
# location /webhook {
# limit_req zone=webhook burst=20 nodelay;
# proxy_pass http://127.0.0.1:3000;
# }
}
```
</details>
## API Endpoints
| Method | Path | Description |
| -------- | ------------- | ---------------------------------------------------------------------------------------- |
| `GET` | `/health` | Health check - returns `status`, `uptime` (seconds), `version`, `dryRun`, and `backends` |
| `POST` | `/webhook` | GitHub webhook receiver (rate limited) |
| `GET` | `/dashboard` | Web UI dashboard with status, events, and config editor |
| `GET` | `/api/status` | JSON status: uptime, version, dry-run, backends, repo count |
| `GET` | `/api/events` | Recent webhook events from the in-memory ring buffer |
| `DELETE` | `/api/events` | Clear the event ring buffer |
| `GET` | `/api/config` | Current runtime configuration as JSON |
| `PUT` | `/api/config` | Partial config update: deep-merges, validates, and applies in-place |
## Dashboard & Runtime API
Troutbot ships with a built-in web dashboard and JSON API for monitoring and
runtime configuration. No separate frontend build is required.
### Web Dashboard
Navigate to `http://localhost:3000/dashboard` (or wherever your instance is
running). The dashboard provides:
- **Status card** - uptime, version, dry-run state, active backends, and repo
count. Auto-refreshes every 30 seconds.
- **Event log** - table of recent webhook events showing repo, PR/issue number,
action, impact rating, and confidence score. Keeps the last 100 events in
memory.
- **Config editor** - read-only JSON view of the current runtime config with an
"Edit" toggle that lets you modify and save changes without restarting.
The dashboard is a single HTML page with inline CSS and vanilla JS - no
frameworks, no build step, no external assets.
### Runtime Config API
You can inspect and modify the running configuration via the REST API. Changes
are applied in-place without restarting the server. The update endpoint
deep-merges your partial config onto the current one and validates before
applying.
```bash
# Read current config
curl http://localhost:3000/api/config
# Update a single setting (partial merge)
curl -X PUT http://localhost:3000/api/config \
-H 'Content-Type: application/json' \
-d '{"response": {"allowUpdates": true}}'
# Change engine weights at runtime
curl -X PUT http://localhost:3000/api/config \
-H 'Content-Type: application/json' \
-d '{"engine": {"weights": {"checks": 0.5, "diff": 0.25, "quality": 0.25}}}'
```
Invalid configs are rejected with a 400 status and an error message. The
original config remains unchanged if validation fails.
### Event Buffer API
The event buffer stores the last 100 processed webhook events in memory. Events
are lost on restart.
```bash
# List recent events
curl http://localhost:3000/api/events
# Clear the buffer
curl -X DELETE http://localhost:3000/api/events
```
### Securing the Dashboard
The dashboard and API endpoints have no authentication by default. In
production, restrict access using one of:
- **Reverse proxy rules** - limit `/dashboard` and `/api/*` to internal IPs or
require basic auth at the nginx/Caddy layer
- **Firewall rules** - only expose port 3000 to trusted networks
- **Bind to localhost** - set `server.port` and bind to `127.0.0.1` (the Docker
examples already do this), then access via SSH tunnel or VPN
Do not expose the dashboard to the public internet without authentication, as
the config API allows modifying runtime behavior.
## Dry-Run Mode
Without a `GITHUB_TOKEN`, the bot runs in dry-run mode. The quality backend
still works (text analysis), but checks and diff backends return neutral (they
need API access). Comments are logged instead of posted.
## Customizing Messages
Edit `response.messages` in your config. Each impact category takes an array of
strings. One is picked randomly per event.
```typescript
messages: {
positive: [
"The trout approve of this {type}!",
"Upstream looks clear for this {type}.",
],
negative: [
"The trout are worried about this {type}.",
],
neutral: [
"The trout have no opinion on this {type}.",
],
},
```
Placeholders:
- `{type}` - `issue` or `pull request`
- `{impact}` - `positive`, `negative`, or `neutral`