Signed-off-by: NotAShelf <raf@notashelf.dev> Change-Id: I4a46610262c629f22bc61b8581a4a0336a6a6964
555 lines
18 KiB
Markdown
555 lines
18 KiB
Markdown
# Troutbot
|
|
|
|
Troutbot is the final solution to protecting the trout population. It's
|
|
environmental protection incarnate!
|
|
|
|
Well, in reality, it is a GitHub bot that analyzes issues and pull requests
|
|
using real signals such as CI check results, diff quality, and body structure
|
|
and then posts trout-themed comments about the findings. Now you know whether
|
|
your changes hurt or help the trout population.
|
|
|
|
## Operation Modes
|
|
|
|
Troutbot supports two operation modes:
|
|
|
|
### Webhook Mode (Real-time)
|
|
|
|
GitHub sends webhook events to troutbot when issues/PRs are opened or updated.
|
|
Troutbot responds immediately. Best for:
|
|
|
|
- Single or few repositories
|
|
- You have admin access to configure webhooks
|
|
- You can expose a public endpoint
|
|
|
|
### Polling Mode (Periodic)
|
|
|
|
Troutbot periodically polls configured repositories for `@troutbot` mentions in
|
|
comments. Best for:
|
|
|
|
- Monitoring dozens of repositories without webhook setup
|
|
- Running behind a firewall or on dynamic IPs
|
|
- Simplified deployment without webhook secrets
|
|
|
|
Both modes use the same analysis engine and produce the same results.
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
# Install dependencies
|
|
$ npm install
|
|
|
|
# Populate the environment config
|
|
$ cp .env.example .env
|
|
|
|
# Set up application config
|
|
cp config.example.ts config.ts
|
|
|
|
# Edit .env and config.ts, then build and start.
|
|
# If `.env` is not populated, Troutbot will start in dry-run mode.
|
|
pnpm run build && pnpm start
|
|
```
|
|
|
|
## How It Works
|
|
|
|
Troutbot has three analysis backends that analyze issues and PRs:
|
|
|
|
### `checks`
|
|
|
|
Queries the GitHub Checks API for the PR's head commit. Looks at check run
|
|
conclusions (ESLint, Clippy, Jest, cargo test, GitHub Actions, etc.) and scores
|
|
based on pass/fail ratio. Any CI failure is a negative signal. Requires a
|
|
`GITHUB_TOKEN`.
|
|
|
|
### `diff`
|
|
|
|
Fetches the PR's changed files via the GitHub API. Evaluates:
|
|
|
|
- **Size**: Small PRs (< 200 lines) are positive; large PRs (above `maxChanges`)
|
|
are negative
|
|
- **Focus**: Few files changed is positive; 30+ files is negative
|
|
- **Tests**: Presence of test file changes is positive; absence when
|
|
`requireTests` is set is negative
|
|
- **Net deletion**: Removing more code than you add is positive. Less code is
|
|
more good.
|
|
|
|
Requires a `GITHUB_TOKEN`.
|
|
|
|
### `quality`
|
|
|
|
Pure text analysis of the issue/PR description. No API calls needed. Checks for:
|
|
|
|
- **Issues**: Adequate description length, code blocks, reproduction steps,
|
|
expected/actual behavior sections, environment info
|
|
- **PRs**: Description length, linked issues (`Fixes #123`), test plan sections,
|
|
code blocks
|
|
- **Both**: Markdown structure/headers, references to other issues, screenshots
|
|
|
|
Empty or minimal descriptions are flagged as negative.
|
|
|
|
### Combining Results
|
|
|
|
Each backend returns an impact (`positive` / `negative` / `neutral`) and a
|
|
confidence score. The engine combines them using configurable weights (default:
|
|
checks 0.4, diff 0.3, quality 0.3). Backends that return zero confidence (e.g.,
|
|
no CI checks found yet) are excluded from the average. If combined confidence
|
|
falls below `confidenceThreshold`, the result is forced to neutral.
|
|
|
|
## Webhook Mode
|
|
|
|
In webhook mode, troutbot receives real-time events from GitHub.
|
|
|
|
### GitHub Webhook Setup
|
|
|
|
1. Go to your repository's **Settings > Webhooks > Add webhook**
|
|
2. **Payload URL**: `https://your-host/webhook`
|
|
3. **Content type**: `application/json`
|
|
4. **Secret**: Generate with `openssl rand -hex 32` and set as `WEBHOOK_SECRET`
|
|
5. **Events**: Select **Issues**, **Pull requests**, and optionally **Check
|
|
suites** (for re-analysis when CI finishes)
|
|
|
|
If you enable **Check suites** and set `response.allowUpdates: true` in your
|
|
config, troutbot will update its comment on a PR once CI results are available.
|
|
|
|
### Webhook Security
|
|
|
|
- **`WEBHOOK_SECRET` is strongly recommended.** Without it, anyone who can reach
|
|
the `/webhook` endpoint can trigger analysis and post comments. Always set a
|
|
secret and configure the same value in your GitHub webhook settings.
|
|
|
|
## Polling Mode
|
|
|
|
In polling mode, troutbot periodically checks configured repositories for
|
|
`@troutbot` mentions in comments.
|
|
|
|
### Configuration
|
|
|
|
Enable polling in your `config.ts`:
|
|
|
|
```typescript
|
|
polling: {
|
|
enabled: true,
|
|
intervalMinutes: 5, // Check every 5 minutes
|
|
lookbackMinutes: 10, // Look back 10 minutes for new comments
|
|
}
|
|
```
|
|
|
|
### How It Works
|
|
|
|
1. On startup, troutbot fetches recent comments from all configured repositories
|
|
2. It scans each comment for `@troutbot` mentions
|
|
3. When found, it analyzes the associated issue/PR and posts a response
|
|
4. Processed comments are tracked to avoid duplicate responses
|
|
5. The cycle repeats every `intervalMinutes`
|
|
|
|
### On-Demand Analysis
|
|
|
|
Users can trigger analysis by mentioning `@troutbot` in any comment:
|
|
|
|
```plaintext
|
|
Hey @troutbot, can you take a look at this?
|
|
```
|
|
|
|
The bot will analyze the issue/PR and respond with a trout-themed assessment.
|
|
|
|
### Rate Limiting
|
|
|
|
Polling uses the GitHub REST API and respects rate limits. The default settings
|
|
(5 min interval, 10 min lookback) are conservative and work well within GitHub's
|
|
5000 requests/hour limit for personal access tokens.
|
|
|
|
### Requirements
|
|
|
|
- `GITHUB_TOKEN` with read access to all watched repositories
|
|
- Repositories configured in `config.repositories`
|
|
- Write access to post comments
|
|
|
|
## GitHub Account & Token Setup
|
|
|
|
Troutbot is designed to run as a dedicated bot account on GitHub. Create a
|
|
separate GitHub account for the bot (e.g., `troutbot`) so that comments are
|
|
clearly attributed to it rather than to a personal account.
|
|
|
|
### 1. Create the bot account
|
|
|
|
Sign up for a new GitHub account at <https://github.com/signup>. Use a dedicated
|
|
email address for the bot. Give it a recognizable username and avatar.
|
|
|
|
### 2. Grant repository access
|
|
|
|
The bot account needs access to every repository it will comment on:
|
|
|
|
- **For organization repos**: Invite the bot account as a collaborator with
|
|
**Write** access, or add it to a team with write permissions.
|
|
- **For personal repos**: Add the bot account as a collaborator under
|
|
`Settings > Collaborators`.
|
|
|
|
The bot needs write access to post comments. Read access alone is not enough.
|
|
|
|
### 3. Generate a Personal Access Token
|
|
|
|
Log in as the bot account and create a fine-grained PAT:
|
|
|
|
1. Go to
|
|
`Settings > Developer settings > Personal access tokens > Fine-grained tokens`
|
|
2. Click **Generate new token**
|
|
3. Set a descriptive name (e.g., `troutbot-production`)
|
|
4. Set **Expiration** - pick a long-lived duration or no expiration, since this
|
|
runs unattended
|
|
5. Under **Repository access**, select the specific repositories the bot will
|
|
operate on (or **All repositories** if it should cover everything the account
|
|
can see)
|
|
6. Under **Permissions > Repository permissions**, grant:
|
|
- **Checks**: Read (for the `checks` backend to query CI results)
|
|
- **Contents**: Read (for the `diff` backend to fetch changed files)
|
|
- **Issues**: Read and Write (to read issue bodies and post comments)
|
|
- **Pull requests**: Read and Write (to read PR bodies and post comments)
|
|
- **Metadata**: Read (required by all fine-grained tokens)
|
|
7. Click **Generate token** and copy the value
|
|
|
|
Set this as the `GITHUB_TOKEN` environment variable.
|
|
|
|
> **Classic tokens**: If you prefer a classic PAT instead, create one with the
|
|
> `repo` scope. Fine-grained tokens are recommended because they follow the
|
|
> principle of least privilege.
|
|
|
|
## Configuring Troutbot
|
|
|
|
### Environment Variables
|
|
|
|
<!--markdownlint-disable MD013 -->
|
|
|
|
| Variable | Description | Required |
|
|
| ---------------- | ----------------------------------------------------- | ---------------------------- |
|
|
| `GITHUB_TOKEN` | Fine-grained PAT from the bot account (see above) | No (dry-run without it) |
|
|
| `WEBHOOK_SECRET` | Secret for verifying webhook signatures | No (only for webhook mode) |
|
|
| `PORT` | Server port (overrides `server.port` in config) | No |
|
|
| `CONFIG_PATH` | Path to config file | No (defaults to `config.ts`) |
|
|
| `LOG_LEVEL` | Log level override (`debug`, `info`, `warn`, `error`) | No |
|
|
|
|
<!--markdownlint-enable MD013 -->
|
|
|
|
### Config File
|
|
|
|
Copy `config.example.ts` to `config.ts`. The config is a TypeScript module that
|
|
default-exports a `Config` object - full type checking and autocompletion in
|
|
your editor.
|
|
|
|
```typescript
|
|
import type { Config } from './src/types';
|
|
|
|
const config: Config = {
|
|
server: { port: 3000 },
|
|
repositories: [{ owner: 'myorg', repo: 'myrepo' }],
|
|
engine: {
|
|
backends: {
|
|
checks: { enabled: true },
|
|
diff: { enabled: true, maxChanges: 1000, requireTests: false },
|
|
quality: { enabled: true, minBodyLength: 50 },
|
|
},
|
|
weights: { checks: 0.4, diff: 0.3, quality: 0.3 },
|
|
confidenceThreshold: 0.1,
|
|
},
|
|
polling: {
|
|
enabled: true,
|
|
intervalMinutes: 5,
|
|
lookbackMinutes: 10,
|
|
},
|
|
// ...
|
|
};
|
|
|
|
export default config;
|
|
```
|
|
|
|
The config is loaded at runtime via [jiti](https://github.com/unjs/jiti) - no
|
|
pre-compilation needed.
|
|
|
|
See `config.example.ts` for the full annotated reference.
|
|
|
|
## Production Configuration
|
|
|
|
When deploying troutbot to production, keep the following in mind:
|
|
|
|
- **Use a reverse proxy with TLS.** If using webhook mode, GitHub sends payloads
|
|
over HTTPS. Put nginx, Caddy, or a cloud load balancer in front of troutbot
|
|
and terminate TLS there. Polling mode doesn't require a public endpoint.
|
|
- **Set `NODE_ENV=production`.** This is set automatically in the Docker image.
|
|
For standalone deployments, export it in your environment. Express uses this
|
|
to enable performance optimizations.
|
|
- **Rate limiting** is enabled by default at 120 requests/minute on the
|
|
`/webhook` endpoint. Override via `server.rateLimit` in your config file.
|
|
- **Request body size** is capped at 1 MB. GitHub webhook payloads are well
|
|
under this limit.
|
|
- **Graceful shutdown** is built in. The server handles `SIGTERM` and `SIGINT`,
|
|
stops accepting new connections, and waits up to 10 seconds for in-flight
|
|
requests to finish before exiting.
|
|
- **Dashboard access control.** The `/dashboard` and `/api/*` endpoints have no
|
|
built-in authentication. Restrict access via reverse proxy rules, firewall, or
|
|
binding to localhost. See [Securing the Dashboard](#securing-the-dashboard).
|
|
|
|
## Deployment
|
|
|
|
### Standalone (Node.js)
|
|
|
|
```bash
|
|
npm ci
|
|
npm run build
|
|
export NODE_ENV=production
|
|
export GITHUB_TOKEN="ghp_..."
|
|
# Only needed for webhook mode:
|
|
# export WEBHOOK_SECRET="your-secret"
|
|
npm start
|
|
```
|
|
|
|
### Nix
|
|
|
|
**Flake** (NixOS or flake-enabled systems):
|
|
|
|
```nix
|
|
{
|
|
inputs.troutbot.url = "github:notashelf/troutbot";
|
|
|
|
outputs = { self, nixpkgs, troutbot }: {
|
|
nixosConfigurations.myhost = nixpkgs.lib.nixosSystem {
|
|
modules = [
|
|
troutbot.nixosModules.troutbot
|
|
{
|
|
services.troutbot = {
|
|
enable = true;
|
|
environmentFile = "/path/to/.env";
|
|
configPath = "/path/to/config.ts";
|
|
};
|
|
}
|
|
];
|
|
};
|
|
};
|
|
};
|
|
```
|
|
|
|
**Run directly**:
|
|
|
|
```bash
|
|
nix run github:notashelf/troutbot
|
|
```
|
|
|
|
### Docker
|
|
|
|
```bash
|
|
docker build -t troutbot .
|
|
docker run -d \
|
|
--name troutbot \
|
|
-p 127.0.0.1:3000:3000 \
|
|
-e GITHUB_TOKEN="ghp_..." \
|
|
-v $(pwd)/config.ts:/app/config.ts:ro \
|
|
--restart unless-stopped \
|
|
troutbot
|
|
```
|
|
|
|
Multi-stage build, non-root user, built-in health check, `STOPSIGNAL SIGTERM`.
|
|
|
|
### Docker Compose
|
|
|
|
```yaml
|
|
services:
|
|
troutbot:
|
|
build: .
|
|
ports:
|
|
- '127.0.0.1:3000:3000'
|
|
env_file: .env
|
|
volumes:
|
|
- ./config.ts:/app/config.ts:ro
|
|
restart: unless-stopped
|
|
deploy:
|
|
resources:
|
|
limits:
|
|
memory: 256M
|
|
logging:
|
|
driver: json-file
|
|
options:
|
|
max-size: '10m'
|
|
max-file: '3'
|
|
```
|
|
|
|
### Systemd
|
|
|
|
Create `/etc/systemd/system/troutbot.service`:
|
|
|
|
```ini
|
|
[Unit]
|
|
Description=Troutbot GitHub Bot
|
|
After=network.target
|
|
|
|
[Service]
|
|
Type=simple
|
|
User=troutbot
|
|
WorkingDirectory=/opt/troutbot
|
|
ExecStart=/usr/bin/node dist/index.js
|
|
EnvironmentFile=/opt/troutbot/.env
|
|
Restart=on-failure
|
|
RestartSec=5
|
|
TimeoutStopSec=15
|
|
NoNewPrivileges=true
|
|
ProtectSystem=strict
|
|
ProtectHome=true
|
|
ReadWritePaths=/opt/troutbot
|
|
PrivateTmp=true
|
|
|
|
[Install]
|
|
WantedBy=multi-user.target
|
|
```
|
|
|
|
```bash
|
|
sudo systemctl daemon-reload
|
|
sudo systemctl enable --now troutbot
|
|
```
|
|
|
|
### Reverse Proxy (nginx)
|
|
|
|
Only needed for webhook mode:
|
|
|
|
```nginx
|
|
server {
|
|
listen 443 ssl;
|
|
server_name troutbot.example.com;
|
|
|
|
ssl_certificate /etc/letsencrypt/live/troutbot.example.com/fullchain.pem;
|
|
ssl_certificate_key /etc/letsencrypt/live/troutbot.example.com/privkey.pem;
|
|
|
|
client_max_body_size 1m;
|
|
proxy_read_timeout 60s;
|
|
|
|
location / {
|
|
proxy_pass http://127.0.0.1:3000;
|
|
proxy_set_header Host $host;
|
|
proxy_set_header X-Real-IP $remote_addr;
|
|
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
|
proxy_set_header X-Forwarded-Proto $scheme;
|
|
}
|
|
|
|
# Optional: nginx-level rate limiting for webhooks
|
|
# limit_req_zone $binary_remote_addr zone=webhook:10m rate=10r/s;
|
|
# location /webhook {
|
|
# limit_req zone=webhook burst=20 nodelay;
|
|
# proxy_pass http://127.0.0.1:3000;
|
|
# }
|
|
}
|
|
```
|
|
|
|
## API Endpoints
|
|
|
|
<!--markdownlint-disable MD013-->
|
|
|
|
| Method | Path | Description |
|
|
| -------- | ------------- | ---------------------------------------------------------------------------------------- |
|
|
| `GET` | `/health` | Health check - returns `status`, `uptime` (seconds), `version`, `dryRun`, and `backends` |
|
|
| `POST` | `/webhook` | GitHub webhook receiver (rate limited, webhook mode only) |
|
|
| `GET` | `/dashboard` | Web UI dashboard with status, events, and config editor |
|
|
| `GET` | `/api/status` | JSON status: uptime, version, dry-run, backends, repo count |
|
|
| `GET` | `/api/events` | Recent events from the in-memory ring buffer |
|
|
| `DELETE` | `/api/events` | Clear the event ring buffer |
|
|
| `GET` | `/api/config` | Current runtime configuration as JSON |
|
|
| `PUT` | `/api/config` | Partial config update: deep-merges, validates, and applies in-place |
|
|
|
|
<!--markdownlint-enable MD013-->
|
|
|
|
## Dashboard & Runtime API
|
|
|
|
Troutbot ships with a built-in web dashboard and JSON API for monitoring and
|
|
runtime configuration. No separate frontend build is required.
|
|
|
|
### Web Dashboard
|
|
|
|
Navigate to `http://localhost:3000/dashboard` (or wherever your instance is
|
|
running). The dashboard provides:
|
|
|
|
- **Status card** - uptime, version, dry-run state, active backends, and repo
|
|
count. Auto-refreshes every 30 seconds.
|
|
- **Event log** - table of recent events showing repo, PR/issue number, action,
|
|
impact rating, and confidence score. Keeps the last 100 events in memory.
|
|
- **Config editor** - read-only JSON view of the current runtime config with an
|
|
"Edit" toggle that lets you modify and save changes without restarting.
|
|
|
|
The dashboard is a single HTML page with inline CSS and vanilla JS - no
|
|
frameworks, no build step, no external assets.
|
|
|
|
### Runtime Config API
|
|
|
|
You can inspect and modify the running configuration via the REST API. Changes
|
|
are applied in-place without restarting the server. The update endpoint
|
|
deep-merges your partial config onto the current one and validates before
|
|
applying.
|
|
|
|
```bash
|
|
# Read current config
|
|
curl http://localhost:3000/api/config
|
|
|
|
# Update a single setting (partial merge)
|
|
curl -X PUT http://localhost:3000/api/config \
|
|
-H 'Content-Type: application/json' \
|
|
-d '{"response": {"allowUpdates": true}}'
|
|
|
|
# Change engine weights at runtime
|
|
curl -X PUT http://localhost:3000/api/config \
|
|
-H 'Content-Type: application/json' \
|
|
-d '{"engine": {"weights": {"checks": 0.5, "diff": 0.25, "quality": 0.25}}}'
|
|
```
|
|
|
|
Invalid configs are rejected with a 400 status and an error message. The
|
|
original config remains unchanged if validation fails.
|
|
|
|
### Event Buffer API
|
|
|
|
The event buffer stores the last 100 processed events in memory (from both
|
|
webhooks and polling). Events are lost on restart.
|
|
|
|
```bash
|
|
# List recent events
|
|
curl http://localhost:3000/api/events
|
|
|
|
# Clear the buffer
|
|
curl -X DELETE http://localhost:3000/api/events
|
|
```
|
|
|
|
### Securing the Dashboard
|
|
|
|
The dashboard and API endpoints have no authentication by default. In
|
|
production, restrict access using one of:
|
|
|
|
- **Reverse proxy rules** - limit `/dashboard` and `/api/*` to internal IPs or
|
|
require basic auth at the nginx/Caddy layer
|
|
- **Firewall rules** - only expose port 3000 to trusted networks
|
|
- **Bind to localhost** - set `server.port` and bind to `127.0.0.1` (the Docker
|
|
examples already do this), then access via SSH tunnel or VPN
|
|
|
|
Do not expose the dashboard to the public internet without authentication, as
|
|
the config API allows modifying runtime behavior.
|
|
|
|
## Dry-Run Mode
|
|
|
|
Without a `GITHUB_TOKEN`, the bot runs in dry-run mode. The quality backend
|
|
still works (text analysis), but checks and diff backends return neutral (they
|
|
need API access). Comments are logged instead of posted.
|
|
|
|
## Customizing Messages
|
|
|
|
Edit `response.messages` in your config. Each impact category takes an array of
|
|
strings. One is picked randomly per event.
|
|
|
|
```typescript
|
|
messages: {
|
|
positive: [
|
|
"The trout approve of this {type}!",
|
|
"Upstream looks clear for this {type}.",
|
|
],
|
|
negative: [
|
|
"The trout are worried about this {type}.",
|
|
],
|
|
neutral: [
|
|
"The trout have no opinion on this {type}.",
|
|
],
|
|
},
|
|
```
|
|
|
|
Placeholders:
|
|
|
|
- `{type}` - `issue` or `pull request`
|
|
- `{impact}` - `positive`, `negative`, or `neutral`
|