Signed-off-by: NotAShelf <raf@notashelf.dev> Change-Id: I0df00ecfddf98db1ebc85c2fc7758e326a6a6964
305 lines
8.5 KiB
Markdown
305 lines
8.5 KiB
Markdown
# Watchdog
|
|
|
|
Watchdog is a lightweight, privacy-first analytics system with fully declarative
|
|
configuration that aggregates web traffic data and exports it as Prometheus
|
|
metrics. Unlike traditional analytics platforms, Watchdog stores no raw events,
|
|
uses no persistent user identifiers, and enforces bounded cardinality by design.
|
|
|
|
## Features
|
|
|
|
Watchdog is **privacy-first** in design. There are no cookies, no `localStorage`
|
|
or browser fingerprinting. Daily salt rotation prevents cross-day visitor
|
|
correlation, and there is no raw event storage. We only aggregate metrics. Other
|
|
noteworthy features:
|
|
|
|
- Multi-site analytics with optional domain tracking
|
|
- Bounded cardinality prevents metric explosion
|
|
- Rate limiting and DoS protection
|
|
- Graceful shutdown with state persistence
|
|
- IPv6 support with proper proxy header handling
|
|
|
|
See the [privacy section](#privacy) for more details on what "guarantees"
|
|
Watchdog has to offer.
|
|
|
|
## Quick Start
|
|
|
|
### NixOS
|
|
|
|
The recommended way of using Watchdog is consuming the NixOS module. It handles
|
|
everything from dependencies to environment setup and Systemd configuration.
|
|
Simply import the NixOS module provided by this flake, and enable
|
|
`services.watchdog`:
|
|
|
|
```nix
|
|
{inputs, ...}: {
|
|
imports = [inputs.watchdog.nixosModules.default];
|
|
|
|
services.watchdog = {
|
|
enable = true;
|
|
settings = {
|
|
site.domains = ["example.com"];
|
|
server.listen_addr = "127.0.0.1:8080";
|
|
};
|
|
};
|
|
}
|
|
```
|
|
|
|
The `settings` option is freeform, meaning it'll be serialized to the YAML
|
|
configuration automatically.
|
|
|
|
### Systemd
|
|
|
|
On non-NixOS distributions, you may build Watchdog with `go build` and copy it
|
|
somewhere that's in your `PATH`. Usually this is `/usr/local/bin` for system
|
|
installations:
|
|
|
|
```bash
|
|
# Build
|
|
$ go build -o /usr/local/bin/watchdog .
|
|
|
|
# Install service
|
|
$ sudo install -Dm700 contrib/systemd/watchdog.service /etc/systemd/system/
|
|
$ sudo systemctl daemon-reload
|
|
$ sudo systemctl enable --now watchdog
|
|
```
|
|
|
|
### Binary
|
|
|
|
You may also run the binary from any path by simply building and running it, but
|
|
it is generally not very advisable. You may consider something like `screen` to
|
|
background the Watchdog process. Though, Systemd is the only supported
|
|
installation mechanism.
|
|
|
|
```bash
|
|
# Build
|
|
$ go build -o watchdog .
|
|
|
|
# Run
|
|
$ ./watchdog --config config.yaml
|
|
```
|
|
|
|
## Configuration
|
|
|
|
[configuration reference]: docs/configuration.md
|
|
|
|
Watchdog currently supports configuration via YAML file, environment variables
|
|
or command-line flags. You may find a more complete reference in the
|
|
[configuration reference] document.
|
|
|
|
### Quick Start
|
|
|
|
Create `config.yaml`:
|
|
|
|
```yaml
|
|
site:
|
|
# Single-site analytics
|
|
domains:
|
|
- "example.com"
|
|
|
|
# Or multi-site analytics
|
|
# domains:
|
|
# - "example.com"
|
|
# - "blog.example.com"
|
|
# - "shop.example.com"
|
|
|
|
salt_rotation: "daily"
|
|
|
|
collect:
|
|
pageviews: true
|
|
device: true
|
|
referrer: "domain"
|
|
domain: false # Set to true for multi-site analytics
|
|
|
|
limits:
|
|
max_paths: 10000
|
|
max_sources: 500
|
|
max_custom_events: 100
|
|
max_events_per_minute: 10000
|
|
|
|
server:
|
|
listen_addr: "127.0.0.1:8080"
|
|
```
|
|
|
|
## Usage
|
|
|
|
### JavaScript Beacon
|
|
|
|
Similar to Plausible, Watchdog uses a Javascript beacon to track events. In the
|
|
most basic case, you must add it to your site in a `<script>` tag to begin
|
|
collecting metrics:
|
|
|
|
```html
|
|
<script src="https://analytics.example.com/web/beacon.js" defer></script>
|
|
```
|
|
|
|
The script beacon also supports a _variety_ of configuration options via data
|
|
attributions, which you might adjust to your own needs. Some of them are
|
|
described below:
|
|
|
|
```html
|
|
<!-- Custom API endpoint -->
|
|
<script src="/web/beacon.js" data-api="/custom/endpoint" defer></script>
|
|
|
|
<!-- Track specific domain (multi-site) -->
|
|
<script src="/web/beacon.js" data-domain="example.com" defer></script>
|
|
|
|
<!-- Hash-based routing (for SPAs) -->
|
|
<script src="/web/beacon.js" data-hash-mode defer></script>
|
|
|
|
<!-- Track outbound links -->
|
|
<script src="/web/beacon.js" data-outbound-links defer></script>
|
|
|
|
<!-- Track file downloads (.pdf, .zip, .doc, etc.) -->
|
|
<script src="/web/beacon.js" data-file-downloads defer></script>
|
|
|
|
<!-- Exclude paths (comma-separated) -->
|
|
<script src="/web/beacon.js" data-exclude="/admin,/dashboard" defer></script>
|
|
|
|
<!-- Manual pageview tracking -->
|
|
<script src="/web/beacon.js" data-manual defer></script>
|
|
|
|
<!-- Combine multiple options -->
|
|
<script
|
|
src="/web/beacon.js"
|
|
data-hash-mode
|
|
data-outbound-links
|
|
data-file-downloads
|
|
defer
|
|
></script>
|
|
```
|
|
|
|
You can also track custom events as follows:
|
|
|
|
```javascript
|
|
// Simple event
|
|
window.watchdog.track("signup");
|
|
|
|
// Event with custom referrer
|
|
window.watchdog.track("purchase", { referrer: "email-campaign" });
|
|
|
|
// Manual pageview (when data-manual is set)
|
|
window.watchdog.trackPageview();
|
|
|
|
// Force pageview (bypass duplicate detection)
|
|
window.watchdog.trackPageview({ force: true });
|
|
```
|
|
|
|
### Metrics
|
|
|
|
[observability documentation]: docs/observability.md
|
|
|
|
Unlike most common solutions, Watchdog does not do any data visualisation. It
|
|
collects and aggregates metrics in a Prometheus-compatible manner at the
|
|
`/metrics` endpoint. To scrape and store the time-series data, you will need
|
|
**Prometheus**. Grafana is a common solution to _visualizing_ said data.
|
|
|
|
See the [observability documentation] for full setup guide. It is, however,
|
|
recommended to be somewhat knowledgable in those areas before attempting to
|
|
deploy. There exists a Grafana dashboard with support for multi-host and
|
|
multi-site deployments provided in the [contrib directory](contrib/grafana/).
|
|
|
|
While not final, some of the metrics collected are as follows:
|
|
|
|
**Traffic metrics:**
|
|
|
|
- `web_pageviews_total{path,device,referrer,domain}` - Total pageviews
|
|
- `domain` label only present if `site.collect.domain: true`
|
|
- `web_custom_events_total{event}` - Custom event counts
|
|
- `web_daily_unique_visitors` - Estimated unique visitors (HyperLogLog)
|
|
|
|
**Cardinality metrics:**
|
|
|
|
- `web_path_overflow_total` - Paths rejected due to cardinality limit
|
|
- `web_referrer_overflow_total` - Referrers rejected due to limit
|
|
- `web_event_overflow_total` - Custom events rejected due to limit
|
|
- `web_blocked_requests_total{reason}` - File server requests blocked by security filters
|
|
|
|
**Process metrics:**
|
|
|
|
- `watchdog_build_info{version,commit,build_date}` - Build metadata
|
|
- `watchdog_start_time_seconds` - Unix timestamp of process start
|
|
- `go_*` - Go runtime metrics (goroutines, GC, memory)
|
|
- `process_*` - OS process metrics (CPU, RSS, file descriptors)
|
|
|
|
## Privacy
|
|
|
|
Privacy is a fundamental design constraint for Watchdog, and not a feature. All
|
|
personally identifiable information is discarded at ingestion. We only keep
|
|
aggregate counts. Some features worth noting.
|
|
|
|
**No Persistent Identifiers:**
|
|
|
|
- No cookies, `localStorage`, or fingerprinting
|
|
- IP + User-Agent hashed with daily rotating salt
|
|
- Hash discarded after HLL insertion
|
|
|
|
**Data Minimization:**
|
|
|
|
- Only collects: domain, path, referrer, screen width
|
|
- No raw event storage
|
|
- All data aggregated at ingestion
|
|
|
|
**Bounded Cardinality:**
|
|
|
|
- Configurable limits on paths, referrers, events
|
|
- Prevents unbounded metric growth
|
|
- Overflow tracked separately
|
|
|
|
**GDPR/CCPA Compliance:**
|
|
|
|
- No personal data stored
|
|
- Daily salt rotation prevents cross-day correlation
|
|
- Aggregate-only analytics
|
|
|
|
## Contributing
|
|
|
|
Watchdog was built in a very short duration to address a very specific need:
|
|
replace Plausible and replace it as soon as possible. While I've given the
|
|
appropriate amount of care to the codebase, there may be unintended bugs or
|
|
missing features that you'd like to support. In this case, you are very welcome
|
|
to create issues or PRs.
|
|
|
|
### Building
|
|
|
|
The recommended way of developing Watchdog is using the Nix flake, for a
|
|
reproducible toolchain. The default shell already provides everything you need,
|
|
so you can simply use `nix develop` or use `direnv allow` if you use Direnv.
|
|
|
|
Once you have the dependencies, the workflow is relatively simple. Build and
|
|
test with Go, and then with Nix to ensure packaging is correct.
|
|
|
|
```bash
|
|
# Build binary
|
|
$ go build -o watchdog .
|
|
|
|
# Run tests
|
|
$ go test ./...
|
|
|
|
# Run integration tests
|
|
$ go test -tags=integration ./test/...
|
|
|
|
# Build with Nix
|
|
$ nix build
|
|
```
|
|
|
|
### Testing
|
|
|
|
There's a non-negligible test suite for Watchdog that I'm quite proud of. It
|
|
tests anything from configuration validation to path normalization and other
|
|
security features that we'd _rather not regress_. If working on Watchdog, it is
|
|
advisable that you add test cases and run the tests before submitting your
|
|
changes.
|
|
|
|
```bash
|
|
# Unit tests
|
|
$ go test ./...
|
|
|
|
# Integration tests (requires build tag)
|
|
$ go test -tags=integration ./test/...
|
|
|
|
# Benchmarks
|
|
$ go test -bench=. ./test/...
|
|
|
|
# Coverage
|
|
$ go test -cover ./...
|
|
```
|