Signed-off-by: NotAShelf <raf@notashelf.dev> Change-Id: I10f9958209a6cc8a71cbee481bb846c36a6a6964
8.6 KiB
Observability Setup
Watchdog exposes Prometheus-formatted metrics at /metrics. You need a
time-series database to scrape and store these metrics, then visualize them in
Grafana.
Important
Why you need a time-series database:
- Watchdog exposes current state (counters, gauges)
- A TSDB scrapes periodically and stores time-series data
- Grafana visualizes the historical data
- Grafana cannot directly scrape Prometheus
/metricsendpointsCompatible databases:
- Prometheus,
- VictoriaMetrics, or any Prometheus-compatible scraper
Prometheus Setup
Configuring Prometheus
Create /etc/prometheus/prometheus.yml:
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: "watchdog"
static_configs:
- targets: ["localhost:8080"]
# Optional: scrape multiple Watchdog instances
# static_configs:
# - targets:
# - 'watchdog-1.example.com:8080'
# - 'watchdog-2.example.com:8080'
# labels:
# instance: 'production'
# Scrape Prometheus itself
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
Verify Prometheus' health state
# Check Prometheus is running
curl http://localhost:9090/-/healthy
# Check it's scraping Watchdog
curl http://localhost:9090/api/v1/targets
NixOS
Add to your NixOS configuration:
{
services.prometheus = {
enable = true;
port = 9090;
# Retention period
retentionTime = "30d";
scrapeConfigs = [
{
job_name = "watchdog";
static_configs = [{
targets = [ "localhost:8080" ];
}];
}
];
};
# Open firewall if needed
# networking.firewall.allowedTCPPorts = [ 9090 ];
}
For multiple Watchdog instances:
{
services.prometheus.scrapeConfigs = [
{
job_name = "watchdog";
static_configs = [
{
labels.env = "production";
targets = [
"watchdog-1:8080"
"watchdog-2:8080"
"watchdog-3:8080"
];
}
];
}
];
}
Grafana Setup
NixOS
{
services.grafana = {
enable = true;
settings = {
server = {
http_addr = "127.0.0.1";
http_port = 3000;
};
};
provision = {
enable = true;
datasources.settings.datasources = [{
name = "Prometheus";
type = "prometheus";
url = "http://localhost:9090"; # Or "http://localhost:8428" for VictoriaMetrics
isDefault = true;
}];
};
};
}
Configure Data Source (Manual)
If you're not using NixOS for provisioning, then you'll need to do provisioning
imperatively from your Grafana configuration. Ths can be done through the
admin panel by navigating to Configuration, and choosing "add data source"
under Data Sources. Select your prometheus instance, and save it.
Import Pre-built Dashboard
A sample Grafana dashboard is provided with support for multi-host and multi-site configurations. Import it, configure the data source and it should work out of the box.
If you're not using NixOS for provisioning, the dashboard also needs to be
provisioned manually. Under Dashboards, select Import and provide the JSON
contents or upload the sample dashboard from contrib/grafana/watchdog.json.
Select your Prometheus data source and import it.
See contrib/grafana/README.md for full documentation.
Example Queries
Once Prometheus is scraping Watchdog and Grafana is connected, you may write your own widgets or create queries. Here are some example queries using Prometheus query language, promql. Those are provided as examples and might not provide everything you need. Nevertheless, use them to improve your setup at your disposal.
If you believe you have some valuable widgets that you'd like to contribute back, feel free!
Top 10 Pages by Traffic
topk(10, sum by (path) (rate(web_pageviews_total[5m])))
Mobile vs Desktop Split
sum by (device) (rate(web_pageviews_total[1h]))
Unique Visitors
web_daily_unique_visitors
Top Referrers
topk(10, sum by (referrer) (rate(web_pageviews_total{referrer!="direct"}[1d])))
Multi-Site: Traffic per Domain
sum by (domain) (rate(web_pageviews_total[1h]))
Cardinality Health
# Should be near zero
rate(web_path_overflow_total[5m])
rate(web_referrer_overflow_total[5m])
rate(web_event_overflow_total[5m])
Horizontal Scaling Considerations
When running multiple Watchdog instances:
- Each instance exposes its own metrics - Prometheus scrapes all instances
- Prometheus aggregates automatically - use
sum()in queries to aggregate across instances - No shared state needed - each Watchdog instance is independent
Watchdog is almost entirely stateless, so horizontal scaling should be trivial as long as you have the necessary infrastructure and, well, the patience. Example with 3 instances:
# Total pageviews across all instances
sum(rate(web_pageviews_total[5m]))
# Per-instance breakdown
sum by (instance) (rate(web_pageviews_total[5m]))
Alternatives to Prometheus
VictoriaMetrics
VictoriaMetrics is a fast, cost-effective monitoring solution and time-series
database that is 100% compatible with Prometheus exposition format. Watchdog's
/metrics endpoint can be scraped directly by VictoriaMetrics without requiring
Prometheus.
Direct Scraping (Recommended)
VictoriaMetrics single-node mode can scrape Watchdog directly using standard Prometheus scrape configuration:
Configuration file (/etc/victoriametrics/scrape.yml):
scrape_configs:
- job_name: "watchdog"
static_configs:
- targets: ["localhost:8080"]
scrape_interval: 15s
metrics_path: /metrics
Run VictoriaMetrics:
victoria-metrics -promscrape.config=/etc/victoriametrics/scrape.yml
NixOS configuration:
{
services.victoriametrics = {
enable = true;
listenAddress = ":8428";
retentionPeriod = "12month";
# Define scrape configs directly. 'prometheusConfig' is the configuration for
# Prometheus-style metrics endpoints, which Watchdog exports.
prometheusConfig = {
scrape_configs = [
{
job_name = "watchdog";
scrape_interval = "15s";
static_configs = [{
targets = [ "localhost:8080" ]; # replace the port
}];
}
];
};
};
}
Using vmagent
Alternatively, for distributed setups or when you need more advanced features
like relabeling, you may use vmagent:
{
services.vmagent = {
enable = true;
remoteWriteUrl = "http://localhost:8428/api/v1/write";
prometheusConfig = {
scrape_configs = [
{
job_name = "watchdog";
static_configs = [{
targets = [ "localhost:8080" ];
}];
}
];
};
};
services.victoriametrics = {
enable = true;
listenAddress = ":8428";
};
}
Prometheus Remote Write
If you are migrating from Prometheus, or if you need PromQL compatibility, or if you just really like using Prometheus for some inexplicable reason you may keep Prometheus but use VictoriaMetrics to remote-write.
{
services.prometheus = {
enable = true;
port = 9090;
scrapeConfigs = [
{
job_name = "watchdog";
static_configs = [{
targets = [ "localhost:8080" ];
}];
}
];
remoteWrite = [{
url = "http://localhost:8428/api/v1/write";
}];
};
services.victoriametrics = {
enable = true;
listenAddress = ":8428";
};
}
Grafana Agent
Lightweight alternative that scrapes and forwards to Grafana Cloud or local Prometheus:
# Systemd setup for Grafana Agent
sudo systemctl enable --now grafana-agent
# /etc/grafana-agent.yaml
metrics:
wal_directory: /var/lib/grafana-agent
configs:
- name: watchdog
scrape_configs:
- job_name: watchdog
static_configs:
- targets: ["localhost:8080"]
remote_write:
- url: http://localhost:9090/api/v1/write
Monitoring the Monitoring
Monitor your scraper:
# Scrape success rate
up{job="watchdog"}
# Scrape duration
scrape_duration_seconds{job="watchdog"}
# Time since last scrape
time() - timestamp(up{job="watchdog"})
For VictoriaMetrics, you can also monitor ingestion stats:
# VM internal metrics
curl http://localhost:8428/metrics | grep vm_rows_inserted_total