troutbot/README.md

# Troutbot

Troutbot is the final solution to protecting the trout population. It's
environmental protection incarnate!

Well, in reality, it is a GitHub bot that analyzes issues and pull requests
using real signals such as CI check results, diff quality, and body structure
and then posts trout-themed comments about the findings. Now you know whether
your changes hurt or help the trout population.

## Operation Modes

Troutbot supports two operation modes:

### Webhook Mode (Real-time)

GitHub sends webhook events to troutbot when issues/PRs are opened or updated.
Troutbot responds immediately. Best for:

- Single or few repositories
- You have admin access to configure webhooks
- You can expose a public endpoint

### Polling Mode (Periodic)

Troutbot periodically polls configured repositories for `@troutbot` mentions in
comments. Best for:

- Monitoring dozens of repositories without webhook setup
- Running behind a firewall or on dynamic IPs
- Simplified deployment without webhook secrets

Both modes use the same analysis engine and produce the same results.

## Quick Start

```bash
# Install dependencies
$ npm install

# Populate the environment config
$ cp .env.example .env

# Set up application config
cp config.example.ts config.ts

# Edit .env and config.ts, then build and start.
# If `.env` is not populated, Troutbot will start in dry-run mode.
pnpm run build && pnpm start
```

## How It Works

Troutbot has three analysis backends that analyze issues and PRs:

### `checks`

Queries the GitHub Checks API for the PR's head commit. Looks at check run
conclusions (ESLint, Clippy, Jest, cargo test, GitHub Actions, etc.) and scores
based on pass/fail ratio. Any CI failure is a negative signal. Requires a
`GITHUB_TOKEN`.

### `diff`

Fetches the PR's changed files via the GitHub API. Evaluates:

- **Size**: Small PRs (< 200 lines) are positive; large PRs (above `maxChanges`)
  are negative
- **Focus**: Few files changed is positive; 30+ files is negative
- **Tests**: Presence of test file changes is positive; absence when
  `requireTests` is set is negative
- **Net deletion**: Removing more code than you add is positive. Less code is
  more good.

Requires a `GITHUB_TOKEN`.

### `quality`

Pure text analysis of the issue/PR description. No API calls needed. Checks for:

- **Issues**: Adequate description length, code blocks, reproduction steps,
  expected/actual behavior sections, environment info
- **PRs**: Description length, linked issues (`Fixes #123`), test plan sections,
  code blocks
- **Both**: Markdown structure/headers, references to other issues, screenshots

Empty or minimal descriptions are flagged as negative.

### Combining Results

Each backend returns an impact (`positive` / `negative` / `neutral`) and a
confidence score. The engine combines them using configurable weights (default:
checks 0.4, diff 0.3, quality 0.3). Backends that return zero confidence (e.g.,
no CI checks found yet) are excluded from the average. If combined confidence
falls below `confidenceThreshold`, the result is forced to neutral.

## Webhook Mode

In webhook mode, troutbot receives real-time events from GitHub.

### GitHub Webhook Setup

1. Go to your repository's **Settings > Webhooks > Add webhook**
2. **Payload URL**: `https://your-host/webhook`
3. **Content type**: `application/json`
4. **Secret**: Generate with `openssl rand -hex 32` and set as `WEBHOOK_SECRET`
5. **Events**: Select **Issues**, **Pull requests**, and optionally **Check
   suites** (for re-analysis when CI finishes)

If you enable **Check suites** and set `response.allowUpdates: true` in your
config, troutbot will update its comment on a PR once CI results are available.

### Webhook Security

- **`WEBHOOK_SECRET` is strongly recommended.** Without it, anyone who can reach
  the `/webhook` endpoint can trigger analysis and post comments. Always set a
  secret and configure the same value in your GitHub webhook settings.

## Polling Mode

In polling mode, troutbot periodically checks configured repositories for
`@troutbot` mentions in comments.

### Configuration

Enable polling in your `config.ts`:

```typescript
polling: {
  enabled: true,
  intervalMinutes: 5,   // Check every 5 minutes
  lookbackMinutes: 10,  // Look back 10 minutes for new comments
}
```

### How It Works

1. On startup, troutbot fetches recent comments from all configured repositories
2. It scans each comment for `@troutbot` mentions
3. When found, it analyzes the associated issue/PR and posts a response
4. Processed comments are tracked to avoid duplicate responses
5. The cycle repeats every `intervalMinutes`

### On-Demand Analysis

Users can trigger analysis by mentioning `@troutbot` in any comment:

```plaintext
Hey @troutbot, can you take a look at this?
```

The bot will analyze the issue/PR and respond with a trout-themed assessment.

### Rate Limiting

Polling uses the GitHub REST API and respects rate limits. The default settings
(5 min interval, 10 min lookback) are conservative and work well within GitHub's
5000 requests/hour limit for personal access tokens.

### Requirements

- `GITHUB_TOKEN` with read access to all watched repositories
- Repositories configured in `config.repositories`
- Write access to post comments

## GitHub Account & Token Setup

Troutbot is designed to run as a dedicated bot account on GitHub. Create a
separate GitHub account for the bot (e.g., `troutbot`) so that comments are
clearly attributed to it rather than to a personal account.

### 1. Create the bot account

Sign up for a new GitHub account at <https://github.com/signup>. Use a dedicated
email address for the bot. Give it a recognizable username and avatar.

### 2. Grant repository access

The bot account needs access to every repository it will comment on:

- **For organization repos**: Invite the bot account as a collaborator with
  **Write** access, or add it to a team with write permissions.
- **For personal repos**: Add the bot account as a collaborator under
  `Settings > Collaborators`.

The bot needs write access to post comments. Read access alone is not enough.

### 3. Generate a Personal Access Token

Log in as the bot account and create a fine-grained PAT:

1. Go to
   `Settings > Developer settings > Personal access tokens > Fine-grained tokens`
2. Click **Generate new token**
3. Set a descriptive name (e.g., `troutbot-production`)
4. Set **Expiration** - pick a long-lived duration or no expiration, since this
   runs unattended
5. Under **Repository access**, select the specific repositories the bot will
   operate on (or **All repositories** if it should cover everything the account
   can see)
6. Under **Permissions > Repository permissions**, grant:
   - **Checks**: Read (for the `checks` backend to query CI results)
   - **Contents**: Read (for the `diff` backend to fetch changed files)
   - **Issues**: Read and Write (to read issue bodies and post comments)
   - **Pull requests**: Read and Write (to read PR bodies and post comments)
   - **Metadata**: Read (required by all fine-grained tokens)
7. Click **Generate token** and copy the value

Set this as the `GITHUB_TOKEN` environment variable.

> **Classic tokens**: If you prefer a classic PAT instead, create one with the
> `repo` scope. Fine-grained tokens are recommended because they follow the
> principle of least privilege.

## Configuring Troutbot

### Environment Variables

<!--markdownlint-disable MD013 -->

| Variable         | Description                                           | Required                     |
| ---------------- | ----------------------------------------------------- | ---------------------------- |
| `GITHUB_TOKEN`   | Fine-grained PAT from the bot account (see above)     | No (dry-run without it)      |
| `WEBHOOK_SECRET` | Secret for verifying webhook signatures               | No (only for webhook mode)   |
| `PORT`           | Server port (overrides `server.port` in config)       | No                           |
| `CONFIG_PATH`    | Path to config file                                   | No (defaults to `config.ts`) |
| `LOG_LEVEL`      | Log level override (`debug`, `info`, `warn`, `error`) | No                           |

<!--markdownlint-enable MD013 -->

### Config File

Copy `config.example.ts` to `config.ts`. The config is a TypeScript module that
default-exports a `Config` object - full type checking and autocompletion in
your editor.

```typescript
import type { Config } from './src/types';

const config: Config = {
  server: { port: 3000 },
  repositories: [{ owner: 'myorg', repo: 'myrepo' }],
  engine: {
    backends: {
      checks: { enabled: true },
      diff: { enabled: true, maxChanges: 1000, requireTests: false },
      quality: { enabled: true, minBodyLength: 50 },
    },
    weights: { checks: 0.4, diff: 0.3, quality: 0.3 },
    confidenceThreshold: 0.1,
  },
  polling: {
    enabled: true,
    intervalMinutes: 5,
    lookbackMinutes: 10,
  },
  // ...
};

export default config;
```

The config is loaded at runtime via [jiti](https://github.com/unjs/jiti) - no
pre-compilation needed.

See `config.example.ts` for the full annotated reference.

## Production Configuration

When deploying troutbot to production, keep the following in mind:

- **Use a reverse proxy with TLS.** If using webhook mode, GitHub sends payloads
  over HTTPS. Put nginx, Caddy, or a cloud load balancer in front of troutbot
  and terminate TLS there. Polling mode doesn't require a public endpoint.
- **Set `NODE_ENV=production`.** This is set automatically in the Docker image.
  For standalone deployments, export it in your environment. Express uses this
  to enable performance optimizations.
- **Rate limiting** is enabled by default at 120 requests/minute on the
  `/webhook` endpoint. Override via `server.rateLimit` in your config file.
- **Request body size** is capped at 1 MB. GitHub webhook payloads are well
  under this limit.
- **Graceful shutdown** is built in. The server handles `SIGTERM` and `SIGINT`,
  stops accepting new connections, and waits up to 10 seconds for in-flight
  requests to finish before exiting.
- **Dashboard access control.** The `/dashboard` and `/api/*` endpoints have no
  built-in authentication. Restrict access via reverse proxy rules, firewall, or
  binding to localhost. See [Securing the Dashboard](#securing-the-dashboard).

## Deployment

### Standalone (Node.js)

```bash
npm ci
npm run build
export NODE_ENV=production
export GITHUB_TOKEN="ghp_..."
# Only needed for webhook mode:
# export WEBHOOK_SECRET="your-secret"
npm start
```

### Nix

**Flake** (NixOS or flake-enabled systems):

```nix
{
  inputs.troutbot.url = "github:notashelf/troutbot";

  outputs = { self, nixpkgs, troutbot }: {
    nixosConfigurations.myhost = nixpkgs.lib.nixosSystem {
      modules = [
        troutbot.nixosModules.troutbot
        {
          services.troutbot = {
            enable = true;
            environmentFile = "/path/to/.env";
            configPath = "/path/to/config.ts";
          };
        }
      ];
    };
  };
};
```

**Run directly**:

```bash
nix run github:notashelf/troutbot
```

### Docker

```bash
docker build -t troutbot .
docker run -d \
  --name troutbot \
  -p 127.0.0.1:3000:3000 \
  -e GITHUB_TOKEN="ghp_..." \
  -v $(pwd)/config.ts:/app/config.ts:ro \
  --restart unless-stopped \
  troutbot
```

Multi-stage build, non-root user, built-in health check, `STOPSIGNAL SIGTERM`.

### Docker Compose

```yaml
services:
  troutbot:
    build: .
    ports:
      - '127.0.0.1:3000:3000'
    env_file: .env
    volumes:
      - ./config.ts:/app/config.ts:ro
    restart: unless-stopped
    deploy:
      resources:
        limits:
          memory: 256M
    logging:
      driver: json-file
      options:
        max-size: '10m'
        max-file: '3'
```

### Systemd

Create `/etc/systemd/system/troutbot.service`:

```ini
[Unit]
Description=Troutbot GitHub Bot
After=network.target

[Service]
Type=simple
User=troutbot
WorkingDirectory=/opt/troutbot
ExecStart=/usr/bin/node dist/index.js
EnvironmentFile=/opt/troutbot/.env
Restart=on-failure
RestartSec=5
TimeoutStopSec=15
NoNewPrivileges=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/opt/troutbot
PrivateTmp=true

[Install]
WantedBy=multi-user.target
```

```bash
sudo systemctl daemon-reload
sudo systemctl enable --now troutbot
```

### Reverse Proxy (nginx)

Only needed for webhook mode:

```nginx
server {
    listen 443 ssl;
    server_name troutbot.example.com;

    ssl_certificate /etc/letsencrypt/live/troutbot.example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/troutbot.example.com/privkey.pem;

    client_max_body_size 1m;
    proxy_read_timeout 60s;

    location / {
        proxy_pass http://127.0.0.1:3000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }

    # Optional: nginx-level rate limiting for webhooks
    # limit_req_zone $binary_remote_addr zone=webhook:10m rate=10r/s;
    # location /webhook {
    #     limit_req zone=webhook burst=20 nodelay;
    #     proxy_pass http://127.0.0.1:3000;
    # }
}
```

## API Endpoints

<!--markdownlint-disable MD013-->

| Method   | Path          | Description                                                                              |
| -------- | ------------- | ---------------------------------------------------------------------------------------- |
| `GET`    | `/health`     | Health check - returns `status`, `uptime` (seconds), `version`, `dryRun`, and `backends` |
| `POST`   | `/webhook`    | GitHub webhook receiver (rate limited, webhook mode only)                                |
| `GET`    | `/dashboard`  | Web UI dashboard with status, events, and config editor                                  |
| `GET`    | `/api/status` | JSON status: uptime, version, dry-run, backends, repo count                              |
| `GET`    | `/api/events` | Recent events from the in-memory ring buffer                                             |
| `DELETE` | `/api/events` | Clear the event ring buffer                                                              |
| `GET`    | `/api/config` | Current runtime configuration as JSON                                                    |
| `PUT`    | `/api/config` | Partial config update: deep-merges, validates, and applies in-place                      |

<!--markdownlint-enable MD013-->

## Dashboard & Runtime API

Troutbot ships with a built-in web dashboard and JSON API for monitoring and
runtime configuration. No separate frontend build is required.

### Web Dashboard

Navigate to `http://localhost:3000/dashboard` (or wherever your instance is
running). The dashboard provides:

- **Status card** - uptime, version, dry-run state, active backends, and repo
  count. Auto-refreshes every 30 seconds.
- **Event log** - table of recent events showing repo, PR/issue number, action,
  impact rating, and confidence score. Keeps the last 100 events in memory.
- **Config editor** - read-only JSON view of the current runtime config with an
  "Edit" toggle that lets you modify and save changes without restarting.

The dashboard is a single HTML page with inline CSS and vanilla JS - no
frameworks, no build step, no external assets.

### Runtime Config API

You can inspect and modify the running configuration via the REST API. Changes
are applied in-place without restarting the server. The update endpoint
deep-merges your partial config onto the current one and validates before
applying.

```bash
# Read current config
curl http://localhost:3000/api/config

# Update a single setting (partial merge)
curl -X PUT http://localhost:3000/api/config \
  -H 'Content-Type: application/json' \
  -d '{"response": {"allowUpdates": true}}'

# Change engine weights at runtime
curl -X PUT http://localhost:3000/api/config \
  -H 'Content-Type: application/json' \
  -d '{"engine": {"weights": {"checks": 0.5, "diff": 0.25, "quality": 0.25}}}'
```

Invalid configs are rejected with a 400 status and an error message. The
original config remains unchanged if validation fails.

### Event Buffer API

The event buffer stores the last 100 processed events in memory (from both
webhooks and polling). Events are lost on restart.

```bash
# List recent events
curl http://localhost:3000/api/events

# Clear the buffer
curl -X DELETE http://localhost:3000/api/events
```

### Securing the Dashboard

The dashboard and API endpoints have no authentication by default. In
production, restrict access using one of:

- **Reverse proxy rules** - limit `/dashboard` and `/api/*` to internal IPs or
  require basic auth at the nginx/Caddy layer
- **Firewall rules** - only expose port 3000 to trusted networks
- **Bind to localhost** - set `server.port` and bind to `127.0.0.1` (the Docker
  examples already do this), then access via SSH tunnel or VPN

Do not expose the dashboard to the public internet without authentication, as
the config API allows modifying runtime behavior.

## Dry-Run Mode

Without a `GITHUB_TOKEN`, the bot runs in dry-run mode. The quality backend
still works (text analysis), but checks and diff backends return neutral (they
need API access). Comments are logged instead of posted.

## Customizing Messages

Edit `response.messages` in your config. Each impact category takes an array of
strings. One is picked randomly per event.

```typescript
messages: {
  positive: [
    "The trout approve of this {type}!",
    "Upstream looks clear for this {type}.",
  ],
  negative: [
    "The trout are worried about this {type}.",
  ],
  neutral: [
    "The trout have no opinion on this {type}.",
  ],
},
```

Placeholders:

- `{type}` - `issue` or `pull request`
- `{impact}` - `positive`, `negative`, or `neutral`