Roger Oriol
d8012dfb6c
monitoring: add dashboard ideas doc
...
Survey of dashboards that could be built from existing and not-yet-enabled
metrics across the cluster's services (traefik, coredns, metallb, cert-manager,
phoenix, litellm, gitea, postgres, etc.), with per-service enable steps and
a recommended priority order.
2026-06-26 20:22:54 +02:00
Roger Oriol
bf1387dc3e
monitoring: add Grafana dashboards + kube-state-metrics & node-exporter
...
Dashboards (provisioned via ConfigMaps into Grafana pod, 'K3s Cluster' folder):
- Cluster Overview: per-namespace CPU/mem/net/fs, pod counts, pod health (KSM)
- Pods & Services: per-pod CPU/mem/net/fs, throttling, pod status, restarts, PVCs
- Nodes: per-node CPU%/mem%, load average, disk usage, network (node-exporter)
- Control Plane & API Server: request rate, latency p95, 5xx, kubelet/PLEG
- Prometheus Self-Monitoring: ingestion, series, scrape duration, memory
Exporters (auto-scraped via existing kubernetes-service-endpoints job):
- kube-state-metrics: pod/deployment/PVC/replica state (kube_pod_status_phase,
kube_pod_container_status_restarts_total, kube_persistentvolumeclaim_*)
- node-exporter (DaemonSet, hostNetwork): node_cpu_seconds_total,
node_memory_*, node_filesystem_*, node_load*, node_network_*
2026-06-26 19:48:17 +02:00