📊 Metrics & Monitoring Guide¶
🎯 Overview¶
Tobogganing provides comprehensive metrics collection and monitoring capabilities through Prometheus-compatible endpoints. Both clients and headends report metrics to the manager service, enabling real-time monitoring and alerting.
📈 Metrics Collection Architecture¶
graph LR
C1[Client 1] -->|Metrics| M[Manager Service]
C2[Client 2] -->|Metrics| M
C3[Client N] -->|Metrics| M
H1[Headend 1] -->|Metrics| M
H2[Headend 2] -->|Metrics| M
M -->|/metrics| P[Prometheus]
P --> G[Grafana]
P --> A[AlertManager]
🔌 API Endpoints¶
Client Metrics Submission¶
POST /api/v1/clients/{client_id}/metrics
Authorization: Bearer {api_key}
Content-Type: application/json
Request Body:
{
"headless": false,
"metrics": {
"bytes_sent": 1048576,
"bytes_received": 2097152,
"packets_sent": 1000,
"packets_received": 1500,
"connection_uptime": 3600
}
}
Headend Metrics Submission¶
POST /api/v1/headends/{headend_id}/metrics
Authorization: Bearer {jwt_token}
Content-Type: application/json
Request Body:
{
"headend_name": "headend-us-east-1",
"metrics": {
"active_connections": 150,
"bandwidth_in": 10485760,
"bandwidth_out": 5242880,
"cpu_usage": 45.2,
"memory_usage": 2147483648
}
}
📊 Available Metrics¶
Client Metrics¶
| Metric | Type | Description | Labels |
|---|---|---|---|
tobogganing_client_bytes_sent | Gauge | Bytes sent by client | client_id, name, type, headless |
tobogganing_client_bytes_received | Gauge | Bytes received by client | client_id, name, type, headless |
tobogganing_client_packets_sent | Gauge | Packets sent by client | client_id, name, type, headless |
tobogganing_client_packets_received | Gauge | Packets received by client | client_id, name, type, headless |
tobogganing_client_connection_uptime_seconds | Gauge | Connection uptime | client_id, name, type, headless |
tobogganing_client_last_check_in_timestamp | Gauge | Last check-in time | client_id, name, type, headless |
Headend Metrics¶
| Metric | Type | Description | Labels |
|---|---|---|---|
tobogganing_headend_active_connections | Gauge | Active connections | headend_id, name, region, datacenter |
tobogganing_headend_bandwidth_in_bytes | Gauge | Incoming bandwidth | headend_id, name, region, datacenter |
tobogganing_headend_bandwidth_out_bytes | Gauge | Outgoing bandwidth | headend_id, name, region, datacenter |
tobogganing_headend_cpu_usage_percent | Gauge | CPU usage percentage | headend_id, name, region, datacenter |
tobogganing_headend_memory_usage_bytes | Gauge | Memory usage | headend_id, name, region, datacenter |
tobogganing_headend_last_check_in_timestamp | Gauge | Last check-in time | headend_id, name, region, datacenter |
Manager Service Metrics¶
| Metric | Type | Description |
|---|---|---|
tobogganing_manager_clusters_total | Gauge | Total registered clusters |
tobogganing_manager_clients_total | Gauge | Total registered clients |
tobogganing_manager_http_requests_total | Counter | HTTP requests processed |
tobogganing_manager_auth_attempts_total | Counter | Authentication attempts |
tobogganing_manager_certificates_issued_total | Counter | Certificates issued |
🔐 Authentication¶
Prometheus Scraping¶
The /metrics endpoint requires authentication via Bearer token:
# prometheus.yml
scrape_configs:
- job_name: 'tobogganing-manager'
bearer_token: 'YOUR_METRICS_TOKEN'
static_configs:
- targets: ['manager.example.com:8000']
Set the metrics token via environment variable:
📱 Client Integration¶
Go Native Client Example¶
package main
import (
"bytes"
"encoding/json"
"net/http"
"time"
)
type ClientMetrics struct {
Headless bool `json:"headless"`
Metrics map[string]interface{} `json:"metrics"`
}
func submitMetrics(clientID, apiKey string) error {
metrics := ClientMetrics{
Headless: false,
Metrics: map[string]interface{}{
"bytes_sent": getTotalBytesSent(),
"bytes_received": getTotalBytesReceived(),
"packets_sent": getTotalPacketsSent(),
"packets_received": getTotalPacketsReceived(),
"connection_uptime": getConnectionUptime(),
},
}
body, _ := json.Marshal(metrics)
req, err := http.NewRequest("POST",
fmt.Sprintf("https://manager.example.com/api/v1/clients/%s/metrics", clientID),
bytes.NewBuffer(body))
req.Header.Set("Authorization", "Bearer " + apiKey)
req.Header.Set("Content-Type", "application/json")
client := &http.Client{Timeout: 10 * time.Second}
resp, err := client.Do(req)
// Handle response...
}
Docker Client Example¶
# In your Docker client
FROM alpine:latest
# Install monitoring tools
RUN apk add --no-cache curl jq
# Metrics submission script
COPY submit_metrics.sh /usr/local/bin/
RUN chmod +x /usr/local/bin/submit_metrics.sh
# Run metrics submission every minute
RUN echo "* * * * * /usr/local/bin/submit_metrics.sh" >> /etc/crontabs/root
#!/bin/sh
# submit_metrics.sh
CLIENT_ID="${CLIENT_ID}"
API_KEY="${API_KEY}"
MANAGER_URL="${MANAGER_URL}"
# Collect metrics
BYTES_SENT=$(cat /sys/class/net/wg0/statistics/tx_bytes)
BYTES_RECEIVED=$(cat /sys/class/net/wg0/statistics/rx_bytes)
PACKETS_SENT=$(cat /sys/class/net/wg0/statistics/tx_packets)
PACKETS_RECEIVED=$(cat /sys/class/net/wg0/statistics/rx_packets)
UPTIME=$(cat /proc/uptime | cut -d' ' -f1)
# Submit to manager
curl -X POST "${MANAGER_URL}/api/v1/clients/${CLIENT_ID}/metrics" \
-H "Authorization: Bearer ${API_KEY}" \
-H "Content-Type: application/json" \
-d "{
\"headless\": true,
\"metrics\": {
\"bytes_sent\": ${BYTES_SENT},
\"bytes_received\": ${BYTES_RECEIVED},
\"packets_sent\": ${PACKETS_SENT},
\"packets_received\": ${PACKETS_RECEIVED},
\"connection_uptime\": ${UPTIME}
}
}"
📊 Grafana Dashboard¶
Example Dashboard JSON¶
{
"dashboard": {
"title": "Tobogganing Monitoring",
"panels": [
{
"title": "Active Clients by Type",
"targets": [
{
"expr": "count by (type) (tobogganing_client_last_check_in_timestamp > (time() - 300))"
}
]
},
{
"title": "Total Bandwidth Usage",
"targets": [
{
"expr": "sum(rate(tobogganing_client_bytes_sent[5m])) + sum(rate(tobogganing_client_bytes_received[5m]))"
}
]
},
{
"title": "Headend CPU Usage",
"targets": [
{
"expr": "tobogganing_headend_cpu_usage_percent"
}
]
}
]
}
}
🚨 Alerting Rules¶
Prometheus Alert Examples¶
groups:
- name: tobogganing_alerts
rules:
- alert: ClientOffline
expr: time() - tobogganing_client_last_check_in_timestamp > 900
for: 5m
labels:
severity: warning
annotations:
summary: "Client {{ $labels.client_name }} is offline"
description: "Client has not checked in for more than 15 minutes"
- alert: HeadendHighCPU
expr: tobogganing_headend_cpu_usage_percent > 80
for: 10m
labels:
severity: critical
annotations:
summary: "Headend {{ $labels.headend_name }} high CPU usage"
description: "CPU usage is {{ $value }}%"
- alert: HeadendDown
expr: time() - tobogganing_headend_last_check_in_timestamp > 300
for: 2m
labels:
severity: critical
annotations:
summary: "Headend {{ $labels.headend_name }} is down"
description: "Headend has not reported metrics for 5 minutes"
🔍 Debugging¶
Check Metrics Endpoint¶
# Get raw metrics (requires authentication)
curl -H "Authorization: Bearer YOUR_METRICS_TOKEN" \
https://manager.example.com/metrics
# Check specific metric
curl -H "Authorization: Bearer YOUR_METRICS_TOKEN" \
https://manager.example.com/metrics | grep tobogganing_client_bytes_sent
Verify Client Submission¶
# Test client metrics submission
curl -X POST https://manager.example.com/api/v1/clients/test-client/metrics \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"headless": false,
"metrics": {
"bytes_sent": 1000,
"bytes_received": 2000,
"connection_uptime": 60
}
}' -v
📋 Best Practices¶
- Submission Frequency
- Clients: Every 1-5 minutes
- Headends: Every 30-60 seconds
-
Adjust based on network conditions
-
Metric Retention
- Keep high-resolution data for 24 hours
- Downsample to 5-minute averages for 7 days
-
Monthly aggregates for long-term storage
-
Security
- Use unique API keys per client
- Rotate metrics tokens regularly
-
Monitor for anomalous submission patterns
-
Performance
- Batch metrics when possible
- Use compression for large payloads
- Implement exponential backoff on failures