mirror of
https://github.com/NotAShelf/watchdog.git
synced 2026-04-15 14:54:00 +00:00
Signed-off-by: NotAShelf <raf@notashelf.dev> Change-Id: I10f9958209a6cc8a71cbee481bb846c36a6a6964
396 lines
8.6 KiB
Markdown
396 lines
8.6 KiB
Markdown
# Observability Setup
|
|
|
|
Watchdog exposes Prometheus-formatted metrics at `/metrics`. You need a
|
|
time-series database to scrape and store these metrics, then visualize them in
|
|
Grafana.
|
|
|
|
> [!IMPORTANT]
|
|
>
|
|
> **Why you need a time-series database:**
|
|
>
|
|
> - Watchdog exposes _current state_ (counters, gauges)
|
|
> - A TSDB _scrapes periodically_ and _stores time-series data_
|
|
> - Grafana _visualizes_ the historical data
|
|
> - Grafana cannot directly scrape Prometheus `/metrics` endpoints
|
|
>
|
|
> **Compatible databases:**
|
|
>
|
|
> - [Prometheus](#prometheus-setup),
|
|
> - [VictoriaMetrics](#victoriametrics), or any Prometheus-compatible scraper
|
|
|
|
## Prometheus Setup
|
|
|
|
### Configuring Prometheus
|
|
|
|
Create `/etc/prometheus/prometheus.yml`:
|
|
|
|
```yaml
|
|
global:
|
|
scrape_interval: 15s
|
|
evaluation_interval: 15s
|
|
|
|
scrape_configs:
|
|
- job_name: "watchdog"
|
|
static_configs:
|
|
- targets: ["localhost:8080"]
|
|
|
|
# Optional: scrape multiple Watchdog instances
|
|
# static_configs:
|
|
# - targets:
|
|
# - 'watchdog-1.example.com:8080'
|
|
# - 'watchdog-2.example.com:8080'
|
|
# labels:
|
|
# instance: 'production'
|
|
|
|
# Scrape Prometheus itself
|
|
- job_name: "prometheus"
|
|
static_configs:
|
|
- targets: ["localhost:9090"]
|
|
```
|
|
|
|
### Verify Prometheus' health state
|
|
|
|
```bash
|
|
# Check Prometheus is running
|
|
curl http://localhost:9090/-/healthy
|
|
|
|
# Check it's scraping Watchdog
|
|
curl http://localhost:9090/api/v1/targets
|
|
```
|
|
|
|
### NixOS
|
|
|
|
Add to your NixOS configuration:
|
|
|
|
```nix
|
|
{
|
|
services.prometheus = {
|
|
enable = true;
|
|
port = 9090;
|
|
|
|
# Retention period
|
|
retentionTime = "30d";
|
|
|
|
scrapeConfigs = [
|
|
{
|
|
job_name = "watchdog";
|
|
static_configs = [{
|
|
targets = [ "localhost:8080" ];
|
|
}];
|
|
}
|
|
];
|
|
};
|
|
|
|
# Open firewall if needed
|
|
# networking.firewall.allowedTCPPorts = [ 9090 ];
|
|
}
|
|
```
|
|
|
|
For multiple Watchdog instances:
|
|
|
|
```nix
|
|
{
|
|
services.prometheus.scrapeConfigs = [
|
|
{
|
|
job_name = "watchdog";
|
|
static_configs = [
|
|
{
|
|
labels.env = "production";
|
|
targets = [
|
|
"watchdog-1:8080"
|
|
"watchdog-2:8080"
|
|
"watchdog-3:8080"
|
|
];
|
|
}
|
|
];
|
|
}
|
|
];
|
|
}
|
|
```
|
|
|
|
## Grafana Setup
|
|
|
|
### NixOS
|
|
|
|
```nix
|
|
{
|
|
services.grafana = {
|
|
enable = true;
|
|
settings = {
|
|
server = {
|
|
http_addr = "127.0.0.1";
|
|
http_port = 3000;
|
|
};
|
|
};
|
|
|
|
provision = {
|
|
enable = true;
|
|
|
|
datasources.settings.datasources = [{
|
|
name = "Prometheus";
|
|
type = "prometheus";
|
|
url = "http://localhost:9090"; # Or "http://localhost:8428" for VictoriaMetrics
|
|
isDefault = true;
|
|
}];
|
|
};
|
|
};
|
|
}
|
|
```
|
|
|
|
### Configure Data Source (Manual)
|
|
|
|
If you're not using NixOS for provisioning, then you'll need to do provisioning
|
|
_imperatively_ from your Grafana configuration. Ths can be done through the
|
|
admin panel by navigating to `Configuration`, and choosing "add data source"
|
|
under `Data Sources`. Select your prometheus instance, and save it.
|
|
|
|
### Import Pre-built Dashboard
|
|
|
|
A sample Grafana dashboard is provided with support for multi-host and
|
|
multi-site configurations. Import it, configure the data source and it should
|
|
work out of the box.
|
|
|
|
If you're not using NixOS for provisioning, the dashboard _also_ needs to be
|
|
provisioned manually. Under `Dashboards`, select `Import` and provide the JSON
|
|
contents or upload the sample dashboard from `contrib/grafana/watchdog.json`.
|
|
Select your Prometheus data source and import it.
|
|
|
|
See [contrib/grafana/README.md](../contrib/grafana/README.md) for full
|
|
documentation.
|
|
|
|
## Example Queries
|
|
|
|
Once Prometheus is scraping Watchdog and Grafana is connected, you may write
|
|
your own widgets or create queries. Here are some example queries using
|
|
Prometheus query language, promql. Those are provided as examples and might not
|
|
provide everything you need. Nevertheless, use them to improve your setup at
|
|
your disposal.
|
|
|
|
If you believe you have some valuable widgets that you'd like to contribute
|
|
back, feel free!
|
|
|
|
### Top 10 Pages by Traffic
|
|
|
|
```promql
|
|
topk(10, sum by (path) (rate(web_pageviews_total[5m])))
|
|
```
|
|
|
|
### Mobile vs Desktop Split
|
|
|
|
```promql
|
|
sum by (device) (rate(web_pageviews_total[1h]))
|
|
```
|
|
|
|
### Unique Visitors
|
|
|
|
```promql
|
|
web_daily_unique_visitors
|
|
```
|
|
|
|
### Top Referrers
|
|
|
|
```promql
|
|
topk(10, sum by (referrer) (rate(web_pageviews_total{referrer!="direct"}[1d])))
|
|
```
|
|
|
|
### Multi-Site: Traffic per Domain
|
|
|
|
```promql
|
|
sum by (domain) (rate(web_pageviews_total[1h]))
|
|
```
|
|
|
|
### Cardinality Health
|
|
|
|
```promql
|
|
# Should be near zero
|
|
rate(web_path_overflow_total[5m])
|
|
rate(web_referrer_overflow_total[5m])
|
|
rate(web_event_overflow_total[5m])
|
|
```
|
|
|
|
## Horizontal Scaling Considerations
|
|
|
|
When running multiple Watchdog instances:
|
|
|
|
1. **Each instance exposes its own metrics** - Prometheus scrapes all instances
|
|
2. **Prometheus aggregates automatically** - use `sum()` in queries to aggregate
|
|
across instances
|
|
3. **No shared state needed** - each Watchdog instance is independent
|
|
|
|
Watchdog is almost entirely stateless, so horizontal scaling should be trivial
|
|
as long as you have the necessary infrastructure and, well, the patience.
|
|
Example with 3 instances:
|
|
|
|
```promql
|
|
# Total pageviews across all instances
|
|
sum(rate(web_pageviews_total[5m]))
|
|
|
|
# Per-instance breakdown
|
|
sum by (instance) (rate(web_pageviews_total[5m]))
|
|
```
|
|
|
|
## Alternatives to Prometheus
|
|
|
|
### VictoriaMetrics
|
|
|
|
VictoriaMetrics is a fast, cost-effective monitoring solution and time-series
|
|
database that is 100% compatible with Prometheus exposition format. Watchdog's
|
|
`/metrics` endpoint can be scraped directly by VictoriaMetrics without requiring
|
|
Prometheus.
|
|
|
|
#### Direct Scraping (Recommended)
|
|
|
|
VictoriaMetrics single-node mode can scrape Watchdog directly using standard
|
|
Prometheus scrape configuration:
|
|
|
|
**Configuration file (`/etc/victoriametrics/scrape.yml`):**
|
|
|
|
```yaml
|
|
scrape_configs:
|
|
- job_name: "watchdog"
|
|
static_configs:
|
|
- targets: ["localhost:8080"]
|
|
scrape_interval: 15s
|
|
metrics_path: /metrics
|
|
```
|
|
|
|
**Run VictoriaMetrics:**
|
|
|
|
```bash
|
|
victoria-metrics -promscrape.config=/etc/victoriametrics/scrape.yml
|
|
```
|
|
|
|
**NixOS configuration:**
|
|
|
|
```nix
|
|
{
|
|
services.victoriametrics = {
|
|
enable = true;
|
|
listenAddress = ":8428";
|
|
retentionPeriod = "12month";
|
|
|
|
# Define scrape configs directly. 'prometheusConfig' is the configuration for
|
|
# Prometheus-style metrics endpoints, which Watchdog exports.
|
|
prometheusConfig = {
|
|
scrape_configs = [
|
|
{
|
|
job_name = "watchdog";
|
|
scrape_interval = "15s";
|
|
static_configs = [{
|
|
targets = [ "localhost:8080" ]; # replace the port
|
|
}];
|
|
}
|
|
];
|
|
};
|
|
};
|
|
}
|
|
```
|
|
|
|
#### Using `vmagent`
|
|
|
|
Alternatively, for distributed setups or when you need more advanced features
|
|
like relabeling, you may use `vmagent`:
|
|
|
|
```nix
|
|
{
|
|
services.vmagent = {
|
|
enable = true;
|
|
remoteWriteUrl = "http://localhost:8428/api/v1/write";
|
|
|
|
prometheusConfig = {
|
|
scrape_configs = [
|
|
{
|
|
job_name = "watchdog";
|
|
static_configs = [{
|
|
targets = [ "localhost:8080" ];
|
|
}];
|
|
}
|
|
];
|
|
};
|
|
};
|
|
|
|
services.victoriametrics = {
|
|
enable = true;
|
|
listenAddress = ":8428";
|
|
};
|
|
}
|
|
```
|
|
|
|
#### Prometheus Remote Write
|
|
|
|
If you are migrating from Prometheus, or if you need PromQL compatibility, or if
|
|
you just really like using Prometheus for some inexplicable reason you may keep
|
|
Prometheus but use VictoriaMetrics to remote-write.
|
|
|
|
```nix
|
|
{
|
|
services.prometheus = {
|
|
enable = true;
|
|
port = 9090;
|
|
|
|
scrapeConfigs = [
|
|
{
|
|
job_name = "watchdog";
|
|
static_configs = [{
|
|
targets = [ "localhost:8080" ];
|
|
}];
|
|
}
|
|
];
|
|
|
|
remoteWrite = [{
|
|
url = "http://localhost:8428/api/v1/write";
|
|
}];
|
|
};
|
|
|
|
services.victoriametrics = {
|
|
enable = true;
|
|
listenAddress = ":8428";
|
|
};
|
|
}
|
|
```
|
|
|
|
### Grafana Agent
|
|
|
|
Lightweight alternative that scrapes and forwards to Grafana Cloud or local
|
|
Prometheus:
|
|
|
|
```bash
|
|
# Systemd setup for Grafana Agent
|
|
sudo systemctl enable --now grafana-agent
|
|
```
|
|
|
|
```yaml
|
|
# /etc/grafana-agent.yaml
|
|
metrics:
|
|
wal_directory: /var/lib/grafana-agent
|
|
configs:
|
|
- name: watchdog
|
|
scrape_configs:
|
|
- job_name: watchdog
|
|
static_configs:
|
|
- targets: ["localhost:8080"]
|
|
remote_write:
|
|
- url: http://localhost:9090/api/v1/write
|
|
```
|
|
|
|
## Monitoring the Monitoring
|
|
|
|
Monitor your scraper:
|
|
|
|
```promql
|
|
# Scrape success rate
|
|
up{job="watchdog"}
|
|
|
|
# Scrape duration
|
|
scrape_duration_seconds{job="watchdog"}
|
|
|
|
# Time since last scrape
|
|
time() - timestamp(up{job="watchdog"})
|
|
```
|
|
|
|
For VictoriaMetrics, you can also monitor ingestion stats:
|
|
|
|
```bash
|
|
# VM internal metrics
|
|
curl http://localhost:8428/metrics | grep vm_rows_inserted_total
|
|
```
|