Skip to content

Full Observability Stack on k3s: Prometheus, Loki, Jaeger, Grafana, and Cloudflare Logpush

A complete guide to building a full observability stack on a 4-node ARM64 k3s homelab cluster. No Helm — everything is raw Kustomize manifests. The stack covers metrics (Prometheus + Alertmanager), logging (Loki + Alloy), tracing (Jaeger with spanmetrics), and visualization (Grafana with 16 dashboards). On top of the standard LGTM stack, Cloudflare Logpush feeds HTTP request logs, firewall events, and Workers traces through a custom Traefik decompression plugin into Loki for security analytics and performance monitoring. Traefik access logs are enriched with structured metadata (bot scores, client IPs, TLS versions) via Alloy for a dedicated access log dashboard.

The guide is structured as a linear build-up: Prometheus Operator and core metrics first, then Loki and log collection, then Jaeger tracing, then Grafana with dashboards and SSO, then the Cloudflare Logpush pipeline with its custom Traefik plugin. Each section includes the actual manifests used in production.


The cluster runs on 4x ARM64 Rock boards (rock1-rock4) on a private LAN behind a VyOS router with a PPPoE WAN link. All HTTP traffic enters via Cloudflare Tunnel through Traefik. The monitoring stack runs entirely in the monitoring namespace.

d2 diagram
ComponentVersionImage
Prometheus Operatorv0.89.0quay.io/prometheus-operator/prometheus-operator:v0.89.0
Prometheusv3.9.1quay.io/prometheus/prometheus:v3.9.1
Alertmanagerv0.31.1quay.io/prometheus/alertmanager:v0.31.1
kube-state-metricsv2.18.0registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.18.0
Node Exporterv1.10.2quay.io/prometheus/node-exporter:v1.10.2
Blackbox Exporterv0.28.0quay.io/prometheus/blackbox-exporter:v0.28.0
Grafana12.3.3docker.io/grafana/grafana:12.3.3
k8s-sidecar2.5.0quay.io/kiwigrid/k8s-sidecar:2.5.0
Loki3.6.5docker.io/grafana/loki:3.6.5
Grafana Alloyv1.13.1docker.io/grafana/alloy:v1.13.1
Jaeger2.15.1docker.io/jaegertracing/jaeger:2.15.1

All monitoring UIs are exposed via Cloudflare Tunnel through Traefik IngressRoutes:

ServiceURLIngressRoute
Grafanahttps://grafana-k3s.example.ioingressroutes/grafana-ingress.yaml
Prometheushttps://prom-k3s.example.ioingressroutes/prometheus-ingress.yaml
Alertmanagerhttps://alertmanager-k3s.example.ioingressroutes/alertmanager-ingress.yaml
Jaegerhttps://jaeger-k3s.example.ioingressroutes/jaeger-ingress.yaml

DNS CNAME records and Cloudflare tunnel ingress rules are managed by OpenTofu in cloudflare-tunnel-tf/.


The entire stack is deployed as raw Kustomize manifests. No Helm. This gives full visibility into every resource, avoids Helm’s template abstraction layer, and makes it straightforward to patch individual fields. The trade-off is manual version bumps, which is acceptable for a homelab.

The Prometheus Operator provides 10 CRDs totaling ~3.7 MB:

monitoring/operator/kustomization.yaml
resources:
# CRDs (must be applied before operator)
- crd-alertmanagerconfigs.yaml
- crd-alertmanagers.yaml
- crd-podmonitors.yaml
- crd-probes.yaml
- crd-prometheusagents.yaml
- crd-prometheuses.yaml
- crd-prometheusrules.yaml
- crd-scrapeconfigs.yaml
- crd-servicemonitors.yaml
- crd-thanosrulers.yaml
# Operator RBAC and workload
- serviceaccount.yaml
- clusterrole.yaml
- clusterrolebinding.yaml
- deployment.yaml
- service.yaml
- servicemonitor.yaml
- webhook.yaml

The webhook cert-gen Jobs must complete before the operator Deployment starts. Kustomize handles ordering if everything is in the same kustomization.

The operator manages Prometheus via a Prometheus custom resource. It creates a StatefulSet (prometheus-prometheus), a config-reloader sidecar, and handles all ServiceMonitor/PrometheusRule reconciliation:

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus
namespace: monitoring
spec:
version: v3.9.1
image: quay.io/prometheus/prometheus:v3.9.1
replicas: 1
serviceAccountName: prometheus
retention: 7d
retentionSize: 8GB
storage:
volumeClaimTemplate:
spec:
storageClassName: nfs-client
accessModes: [ReadWriteMany]
resources:
requests:
storage: 10Gi
# Config reloader sidecar resources -- uses strategic merge patch
containers:
- name: config-reloader
resources:
requests:
cpu: 10m
memory: 25Mi
limits:
cpu: 50m
memory: 50Mi
walCompression: true
resources:
requests:
cpu: 200m
memory: 512Mi
limits:
cpu: "2"
memory: 2Gi
# Selectors -- all match `release: prometheus` label
serviceMonitorSelector:
matchLabels:
release: prometheus
serviceMonitorNamespaceSelector: {}
podMonitorSelector:
matchLabels:
release: prometheus
podMonitorNamespaceSelector: {}
probeSelector:
matchLabels:
release: prometheus
probeNamespaceSelector: {}
ruleSelector:
matchLabels:
release: prometheus
ruleNamespaceSelector: {}
scrapeConfigSelector:
matchLabels:
release: prometheus
scrapeConfigNamespaceSelector: {}
alerting:
alertmanagers:
- namespace: monitoring
name: alertmanager
port: http-web
apiVersion: v2
securityContext:
fsGroup: 65534
runAsGroup: 65534
runAsNonRoot: true
runAsUser: 65534
seccompProfile:
type: RuntimeDefault
externalUrl: https://prom-k3s.example.io

18 ServiceMonitors scrape targets across the cluster. The release: prometheus label is the common selector:

ServiceMonitorNamespaceTarget
prometheus-operatormonitoringOperator metrics
prometheusmonitoringPrometheus self-metrics
alertmanagermonitoringAlertmanager metrics
grafanamonitoringGrafana metrics
kube-state-metricsmonitoringkube-state-metrics
node-exportermonitoringNode Exporter (all nodes)
blackbox-exportermonitoringBlackbox Exporter
lokimonitoringLoki metrics
alloymonitoringGrafana Alloy (DaemonSet)
alloy-logpushmonitoringAlloy Logpush receiver
jaegermonitoringJaeger metrics
traefiktraefikTraefik ingress controller
cloudflaredcloudflaredCloudflare tunnel daemon
authentik-metricsauthentikAuthentik server
revistarevistaRevista app
kubeletkube-systemKubelet + cAdvisor
corednskube-systemCoreDNS
apiserverdefaultKubernetes API server

Cross-namespace ServiceMonitors (traefik, cloudflared, authentik, revista, kubelet, coredns, apiserver) live in monitoring/servicemonitors/ and use namespaceSelector.matchNames to reach across namespaces.

The kubelet ServiceMonitor scrapes three endpoints from the same port: /metrics (kubelet), /metrics/cadvisor (container metrics), and /metrics/probes (probe metrics). All use bearer token auth against the k8s API server CA.

Six PrometheusRule CRs provide alerting and recording rules:

Rule fileCoverage
general-rules.yamlWatchdog, InfoInhibitor, TargetDown
kubernetes-apps.yamlPod CrashLoopBackOff, container restarts, Deployment/StatefulSet failures
kubernetes-resources.yamlCPU/memory quota overcommit, namespace resource limits
node-rules.yamlNode filesystem, memory, CPU, network, clock skew
k8s-recording-rules.yamlPre-computed recording rules for dashboards
traefik-rules.yamlTraefik-specific alerting rules

Prometheus and Grafana use KEDA ScaledObjects for autoscaling:

# Prometheus: targets the operator-created StatefulSet
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: prometheus-keda
namespace: monitoring
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: StatefulSet
name: prometheus-prometheus
minReplicaCount: 1
maxReplicaCount: 8
triggers:
- type: cpu
metadata:
type: Utilization
value: "50"
- type: memory
metadata:
type: Utilization
value: "50"

Managed by the Prometheus Operator via the Alertmanager CR:

apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
name: alertmanager
namespace: monitoring
spec:
version: v0.31.1
image: quay.io/prometheus/alertmanager:v0.31.1
replicas: 1
serviceAccountName: prometheus
storage:
volumeClaimTemplate:
spec:
storageClassName: nfs-client
accessModes: [ReadWriteMany]
resources:
requests:
storage: 1Gi
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 200m
memory: 256Mi
securityContext:
fsGroup: 65534
runAsGroup: 65534
runAsNonRoot: true
runAsUser: 65534
seccompProfile:
type: RuntimeDefault
externalUrl: https://alertmanager-k3s.example.io

The Alertmanager config (routing rules, SMTP credentials) lives in alertmanager/secret.yaml and must be SOPS-encrypted before committing:

Terminal window
sops --encrypt --age <YOUR_AGE_PUBLIC_KEY> \
--encrypted-regex '^(data|stringData)$' \
--in-place monitoring/alertmanager/secret.yaml

Loki runs in monolithic mode (-target=all) as a single-replica StatefulSet with filesystem storage on NFS.

monitoring/loki/configmap.yaml
data:
loki.yaml: |
target: all
auth_enabled: false
server:
http_listen_port: 3100
grpc_listen_port: 9095
log_level: info
common:
path_prefix: /loki
ring:
instance_addr: 0.0.0.0
kvstore:
store: inmemory
replication_factor: 1
schema_config:
configs:
- from: "2024-01-01"
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h
storage_config:
filesystem:
directory: /loki/chunks
tsdb_shipper:
active_index_directory: /local/tsdb-index # emptyDir, NOT NFS
cache_location: /local/tsdb-cache # emptyDir, NOT NFS
compactor:
working_directory: /loki/compactor
compaction_interval: 5m
retention_enabled: true
delete_request_store: filesystem
retention_delete_delay: 2h
retention_delete_worker_count: 150
frontend:
encoding: protobuf # required for approx_topk
compress_responses: true
log_queries_longer_than: 5s # log slow queries for investigation
query_range:
align_queries_with_step: true
parallelise_shardable_queries: true
cache_results: true
results_cache:
cache:
embedded_cache:
enabled: true
max_size_mb: 100 # ~100MB RAM for query result cache
ttl: 24h
shard_aggregations: approx_topk # string, NOT a YAML list
chunk_store_config:
chunk_cache_config:
embedded_cache:
enabled: true
max_size_mb: 256 # ~256MB RAM for chunk cache
ttl: 24h
querier:
max_concurrent: 16 # 16 parallel workers per instance
query_scheduler:
max_outstanding_requests_per_tenant: 32768 # TSDB dispatches many small requests
limits_config:
retention_period: 2160h # 90 days
reject_old_samples: true
reject_old_samples_max_age: 2160h # 90 days
ingestion_rate_mb: 10
ingestion_burst_size_mb: 20
split_queries_by_interval: 15m # split 24h query into 96 sub-queries
max_query_parallelism: 32 # up from 2, allows 32 sub-queries in flight
tsdb_max_query_parallelism: 64 # TSDB-specific, allows more shards
query_timeout: 5m # up from 1m default
max_cache_freshness_per_query: 10m
max_query_series: 5000
allow_structured_metadata: true
volume_enabled: true

Key settings:

SettingValueWhy
schema: v13TSDBLatest Loki schema, required for structured metadata
retention_period: 2160h90 daysLong retention for trend analysis and incident postmortems
reject_old_samples_max_age: 2160h90 daysMatches retention period; rejects samples older than this
max_query_series: 5000HighRequired for topk queries on high-cardinality Logpush data (see Part 8)
ingestion_rate_mb: 1010 MB/sLogpush batches can be large; default was too low
allow_structured_metadata: trueRequiredEnables structured metadata for Alloy’s Traefik access log enrichment
delete_request_store: filesystemRequiredMust be set when retention_enabled: true, otherwise Loki fails to start
frontend.encoding: protobufRequiredNeeded for approx_topk to function correctly
query_range.shard_aggregations: approx_topkStringEnables approx_topk aggregation sharding; must be a plain string, not a YAML list

Performance tuning (see Performance Tuning for the full rationale):

SettingValueWhy
split_queries_by_interval: 15m96 sub-queries per 24h rangeDefault 1h only creates 24; 15m gives 6.4x more parallelism
max_query_parallelism: 32Up from 2Allows 32 sub-queries in the work queue simultaneously
tsdb_max_query_parallelism: 64TSDB-specificTSDB dynamic sharding generates many individually smaller requests
querier.max_concurrent: 1616 workersGrafana recommends ~16 for TSDB; default is 4
query_timeout: 5mUp from 1m defaultLarge range scans on access logs need more time
results_cacheEmbedded, 100MB, 24hRepeat queries return instantly from in-memory cache
chunk_cache_configEmbedded, 256MB, 24hAvoids re-fetching chunks from NFS for recent data
tsdb_shipper.active_index_directory/local/tsdb-index (emptyDir)TSDB index reads on NFS add 1-10ms per operation; local disk is 0.01-0.1ms
query_scheduler.max_outstanding_requests_per_tenant: 32768High queueTSDB dispatches many more, individually smaller requests than BoltDB
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: loki
namespace: monitoring
spec:
replicas: 1
serviceName: loki-headless
template:
spec:
securityContext:
runAsUser: 10001
runAsGroup: 10001
fsGroup: 10001
runAsNonRoot: true
containers:
- name: loki
image: docker.io/grafana/loki:3.6.5
args:
- -config.file=/etc/loki/loki.yaml
env:
- name: GOMEMLIMIT
value: "1600MiB" # 80% of 2Gi limit, prevents OOM
- name: GOGC
value: "75" # more aggressive GC for ARM64
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 2000m
memory: 2Gi
volumeMounts:
- name: config
mountPath: /etc/loki
- name: data
mountPath: /loki
- name: tsdb-local
mountPath: /local
volumes:
- name: tsdb-local
emptyDir:
sizeLimit: 2Gi
volumeClaimTemplates:
- metadata:
name: data
spec:
storageClassName: nfs-client
accessModes: [ReadWriteMany]
resources:
requests:
storage: 20Gi

GOMEMLIMIT (Go 1.19+) tells the runtime to start aggressive garbage collection when approaching the limit, preventing OOM kills. Set it to ~80% of the container memory limit. GOGC=75 (default 100) triggers GC slightly more frequently, which helps on memory-constrained ARM64 nodes.

The tsdb-local emptyDir volume stores TSDB index and cache on the node’s local disk instead of NFS. Index lookups are small random reads that compound dramatically over NFS latency. On pod restart, the index is automatically rebuilt from chunks (takes a few minutes), so ephemeral storage is safe here.

Loki’s Write-Ahead Log (WAL) on NFS can become corrupted after power outages or unclean shutdowns. Symptoms: Loki enters a crash loop with "segments are not sequential" or stale .tmp checkpoint errors. The init container detects and clears corrupt WAL state before Loki starts:

initContainers:
- name: wal-cleanup
image: busybox:1.37.0
command:
- sh
- -c
- |
WAL_DIR=/loki/wal
COMPACTOR_DIR=/loki/compactor
if [ ! -d "$WAL_DIR" ] || [ -z "$(ls -A $WAL_DIR 2>/dev/null)" ]; then
echo "wal-cleanup: no WAL directory or empty — clean start"
exit 0
fi
# Check for non-sequential WAL segments (gaps cause "segments are not sequential")
CORRUPT=false
SEGMENTS=$(find "$WAL_DIR" -maxdepth 1 -type f -name '[0-9]*' | sort)
if [ -n "$SEGMENTS" ]; then
PREV=-1
for SEG in $SEGMENTS; do
NUM=$(basename "$SEG" | sed 's/^0*//' | sed 's/^$/0/')
if [ "$PREV" -ge 0 ] && [ "$NUM" -ne $((PREV + 1)) ]; then
echo "wal-cleanup: gap detected between segment $PREV and $NUM"
CORRUPT=true
break
fi
PREV=$NUM
done
fi
# Check for stale .tmp checkpoint directories
if ls -d "$WAL_DIR"/checkpoint.*.tmp 2>/dev/null | grep -q .; then
echo "wal-cleanup: stale .tmp checkpoint directories found"
CORRUPT=true
fi
if [ "$CORRUPT" = "true" ]; then
echo "wal-cleanup: corrupt WAL detected — cleaning up"
rm -rf "$WAL_DIR"/* "$COMPACTOR_DIR"/*
echo "wal-cleanup: cleanup complete — Loki will start with empty WAL"
else
echo "wal-cleanup: WAL segments are sequential — no cleanup needed"
fi
volumeMounts:
- name: data
mountPath: /loki
securityContext:
runAsUser: 10001
runAsGroup: 10001

Two corruption patterns are detected:

  1. Non-sequential WAL segments: After a power outage, NFS writes may be partially flushed, leaving gaps in the numbered segment files (e.g., segments 0, 1, 3 — gap at 2). Loki’s WAL reader requires strictly sequential segments.
  2. Stale .tmp checkpoint directories: A checkpoint operation interrupted mid-write leaves a checkpoint.*.tmp directory. Loki treats this as a fatal error on startup.

When either is detected, the init container wipes both /loki/wal/ and /loki/compactor/. In-flight log lines in the WAL are lost (typically seconds of data), but Loki starts cleanly. Already-flushed chunks on disk are unaffected.

The default Loki monolithic configuration is heavily throttled. Out of the box, max_query_parallelism: 2 means a 24h range query is split into 24 one-hour chunks, but only 2 can execute at a time. Combined with NFS-backed TSDB index reads and no caching, dashboard panels with complex LogQL queries (particularly those using | json full-line parsing) were taking 17-24 seconds each.

The fix has three layers:

1. Query parallelism and splitting. split_queries_by_interval: 15m breaks a 24h query into 96 sub-queries instead of 24. max_query_parallelism: 32 allows 32 of those to be in the work queue simultaneously. querier.max_concurrent: 16 runs 16 parallel workers per Loki instance. TSDB’s dynamic sharding further subdivides each time split based on chunk size statistics, targeting 300-600MB per shard. The net effect is that a 24h query that previously ran as 24 serial 1h chunks now fans out across 96 parallel 15m chunks.

2. Embedded caching. Loki supports in-memory caching with zero external dependencies (no memcached/Redis needed). The results cache (100MB) stores completed query responses — repeat queries and dashboard refreshes return instantly. The chunks cache (256MB) stores decompressed chunk data, speeding up first-time queries for recent data by ~30-50%. Total cost: ~356MB of RAM.

3. Local TSDB index. TSDB index directories (active_index_directory, cache_location) are moved from the NFS PVC to an emptyDir volume backed by the node’s local disk. Every index lookup (series resolution, shard planning, chunk reference) was going over NFS at 1-10ms per operation. On local disk, the same operations take 0.01-0.1ms. The index is lightweight and automatically rebuilt from chunks on pod restart, so ephemeral storage is safe.

4. Structured metadata instead of | json parsing. The security dashboard originally used | json | FieldName = "value" on every panel — decompressing and JSON-parsing every log line. After adding downstream_status and user_agent to Alloy’s structured metadata extraction, the dashboard queries were rewritten to filter on SM fields directly (e.g., | downstream_status = "403" instead of | json | DownstreamStatus = "403"). SM filtering happens before line decompression, skipping the expensive JSON parse entirely.

Expected impact (combined):

ScenarioBeforeAfter
24h range query, first run17-24s2-5s
Same query, second run (cached)17-24sunder 1s
Dashboard with 17 simultaneous panelsTimeouts3-8s total

Grafana Alloy serves two roles in this stack:

  1. DaemonSet (alloy/) — runs on every node, collects pod logs and forwards OTLP traces
  2. Deployment (alloy-logpush/) — single instance, receives Cloudflare Logpush data (covered in Part 7)

The DaemonSet Alloy discovers pods on its node, tails their log files, and forwards to Loki. It also receives OTLP traces and batches them to Jaeger. Traefik access logs get special treatment: they are parsed as JSON and enriched with structured metadata for the access log dashboard.

logging {
level = "info"
format = "logfmt"
}
// Pod discovery and log collection
discovery.kubernetes "pods" {
role = "pod"
selectors {
role = "pod"
field = "spec.nodeName=" + coalesce(env("HOSTNAME"), "")
}
}
discovery.relabel "pod_logs" {
targets = discovery.kubernetes.pods.targets
rule {
source_labels = ["__meta_kubernetes_pod_phase"]
regex = "Pending|Succeeded|Failed|Unknown"
action = "drop"
}
rule {
source_labels = ["__meta_kubernetes_namespace"]
target_label = "namespace"
}
rule {
source_labels = ["__meta_kubernetes_pod_name"]
target_label = "pod"
}
rule {
source_labels = ["__meta_kubernetes_pod_container_name"]
target_label = "container"
}
rule {
source_labels = ["__meta_kubernetes_pod_uid",
"__meta_kubernetes_pod_container_name"]
separator = "/"
target_label = "__path__"
replacement = "/var/log/pods/*$1/*.log"
}
}
local.file_match "pod_logs" {
path_targets = discovery.relabel.pod_logs.output
}
loki.source.file "pod_logs" {
targets = local.file_match.pod_logs.targets
forward_to = [loki.process.pod_logs.receiver]
}
loki.process "pod_logs" {
stage.cri {}
// Traefik access logs: parse JSON, extract structured labels + metadata
stage.match {
selector = "{namespace=\"traefik\", container=\"traefik\"}"
stage.json {
expressions = {
status = "DownstreamStatus",
downstream_status = "DownstreamStatus",
method = "RequestMethod",
router = "RouterName",
service = "ServiceName",
entrypoint = "entryPointName",
client_ip = "ClientHost",
real_client_ip = "request_X-Real-Client-Ip",
bot_score = "request_X-Bot-Score",
blocked_by = "request_X-Blocked-By",
country = "request_X-Geo-Country",
cf_connecting_ip = "request_Cf-Connecting-Ip",
request_host = "RequestHost",
request_path = "RequestPath",
request_protocol = "RequestProtocol",
duration = "Duration",
origin_duration = "OriginDuration",
overhead = "Overhead",
downstream_size = "DownstreamContentSize",
tls_version = "TLSVersion",
user_agent = "request_User-Agent",
}
}
// Low-cardinality fields → labels (fast filtering)
stage.labels {
values = {
entrypoint = "",
method = "",
}
}
// High-cardinality fields → structured metadata (19 fields)
// (queryable but don't create new label streams — requires Loki 3.x + TSDB v13)
stage.structured_metadata {
values = {
status = "",
downstream_status = "",
router = "",
service = "",
client_ip = "",
real_client_ip = "",
bot_score = "",
blocked_by = "",
country = "",
cf_connecting_ip = "",
request_host = "",
request_path = "",
request_protocol = "",
duration = "",
origin_duration = "",
overhead = "",
downstream_size = "",
tls_version = "",
user_agent = "",
}
}
// Prometheus counters generated from access logs (scraped by Prometheus as loki_process_custom_*)
// 7 counters: 1 total + 5 per-block-type + 1 for all 403s
stage.metrics {
// Total access log requests (all lines in this match block)
metric.counter {
name = "traefik_access_requests_total"
description = "Total Traefik access log requests"
match_all = true
action = "inc"
}
// Blocked by sentinel bot scoring (X-Blocked-By: sentinel)
metric.counter {
name = "traefik_access_sentinel_blocks_total"
description = "Requests blocked by Sentinel bot scoring"
source = "blocked_by"
value = "sentinel"
action = "inc"
}
// Blocked by sentinel blocklist (X-Blocked-By: sentinel-blocklist)
metric.counter {
name = "traefik_access_blocklist_blocks_total"
description = "Requests blocked by Sentinel IP blocklist"
source = "blocked_by"
value = "sentinel-blocklist"
action = "inc"
}
// Blocked by rate limiting (X-Blocked-By: rate-limit)
metric.counter {
name = "traefik_access_ratelimit_blocks_total"
description = "Requests blocked by Sentinel rate limiting"
source = "blocked_by"
value = "rate-limit"
action = "inc"
}
// Blocked by sentinel firewall rules (X-Blocked-By: sentinel-rule)
metric.counter {
name = "traefik_access_sentinel_rule_blocks_total"
description = "Requests blocked by Sentinel firewall rules"
source = "blocked_by"
value = "sentinel-rule"
action = "inc"
}
// Tarpitted by sentinel (X-Blocked-By: sentinel-tarpit)
metric.counter {
name = "traefik_access_tarpit_blocks_total"
description = "Requests tarpitted by Sentinel"
source = "blocked_by"
value = "sentinel-tarpit"
action = "inc"
}
// 403 responses (any source)
metric.counter {
name = "traefik_access_403_total"
description = "Total 403 responses"
source = "downstream_status"
value = "403"
action = "inc"
}
}
stage.static_labels {
values = {
job = "traefik-access-log",
}
}
}
forward_to = [loki.write.default.receiver]
}
loki.write "default" {
endpoint {
url = "http://loki.monitoring.svc.cluster.local:3100/loki/api/v1/push"
}
}
// OTLP trace receiver -> Jaeger
otelcol.receiver.otlp "default" {
grpc { endpoint = "0.0.0.0:4317" }
http { endpoint = "0.0.0.0:4318" }
output { traces = [otelcol.processor.batch.default.input] }
}
otelcol.processor.batch "default" {
output { traces = [otelcol.exporter.otlp.jaeger.input] }
}
otelcol.exporter.otlp "jaeger" {
client {
endpoint = "jaeger-collector.monitoring.svc.cluster.local:4317"
tls { insecure = true }
}
}

The pipeline:

  1. discovery.kubernetes discovers pods on the current node (filtered by HOSTNAME env var)
  2. discovery.relabel extracts namespace/pod/container labels and constructs the log file path
  3. loki.source.file tails the CRI log files under /var/log/pods/
  4. loki.process applies the stage.cri {} pipeline to parse CRI-format log lines
  5. stage.match selectively processes Traefik container logs (see below)
  6. loki.write pushes to Loki
  7. otelcol.receiver.otlp receives traces from applications on gRPC 4317 / HTTP 4318
  8. otelcol.processor.batch batches traces for efficiency
  9. otelcol.exporter.otlp forwards to Jaeger’s collector

The stage.match block targets only logs from the traefik namespace/container. Traefik writes two types of log lines: JSON access logs and logfmt debug/error logs. The stage.json parser silently skips non-JSON lines (no-op, no drop), so debug logs pass through unmodified.

Fields are split into two tiers based on cardinality:

TierFieldsMechanismPurpose
Labels (low-cardinality)entrypoint, methodstage.labelsFast stream selection in LogQL
Structured metadata (high-cardinality, 19 fields)status, downstream_status, router, service, client_ip, real_client_ip, bot_score, blocked_by, country, cf_connecting_ip, request_host, request_path, request_protocol, duration, origin_duration, overhead, downstream_size, tls_version, user_agentstage.structured_metadataQueryable without creating new label streams

Structured metadata is a Loki 3.x feature (requires TSDB v13 schema and allow_structured_metadata: true). Unlike labels, structured metadata does not affect stream identity — adding a new metadata field does not create new streams or increase index size. This is critical for high-cardinality fields like IP addresses and request paths.

Prometheus counters from access logs:

The stage.metrics block generates 7 Prometheus counters from the extracted JSON fields. These appear at Alloy’s /metrics endpoint (scraped by Prometheus) with the loki_process_custom_ prefix:

Counter (in Prometheus)Source fieldMatch condition
loki_process_custom_traefik_access_requests_total(all lines)match_all = true
loki_process_custom_traefik_access_sentinel_blocks_totalblocked_by= "sentinel" (bot scoring)
loki_process_custom_traefik_access_blocklist_blocks_totalblocked_by= "sentinel-blocklist" (IPsum)
loki_process_custom_traefik_access_ratelimit_blocks_totalblocked_by= "rate-limit"
loki_process_custom_traefik_access_sentinel_rule_blocks_totalblocked_by= "sentinel-rule" (firewall rules)
loki_process_custom_traefik_access_tarpit_blocks_totalblocked_by= "sentinel-tarpit"
loki_process_custom_traefik_access_403_totaldownstream_status= "403" (all sources)

The source field in stage.metrics reads from the extracted data map populated by stage.json, NOT from structured metadata. The JSON key for the blocked-by header is blocked_by (mapped from the Traefik access log’s request_X-Blocked-By field via stage.json). These counters power the Security Dashboard’s instant-loading aggregate statistics and the Grafana Traefik Access Logs dashboard’s Sentinel Security section.

The stage.static_labels block adds job = "traefik-access-log", letting you query access logs specifically: {job="traefik-access-log"}.


Jaeger v2 uses the OpenTelemetry Collector config format. It runs as an all-in-one Deployment with Badger embedded storage on an NFS PVC (10Gi). The config includes a spanmetrics connector that generates R.E.D. (Rate, Error, Duration) metrics from traces and exports them to Prometheus, plus a metric_backends config that lets Jaeger UI query those metrics for the Monitor tab.

monitoring/jaeger/configmap.yaml
data:
ui-config.json: |
{
"monitor": { "menuEnabled": true },
"dependencies": { "menuEnabled": true }
}
config.yaml: |
service:
extensions:
- jaeger_storage
- jaeger_query
- healthcheckv2
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [jaeger_storage_exporter, spanmetrics]
metrics/spanmetrics:
receivers: [spanmetrics]
exporters: [prometheus]
telemetry:
resource:
service.name: jaeger
metrics:
level: detailed
readers:
- pull:
exporter:
prometheus:
host: 0.0.0.0
port: 8888
logs:
level: info
extensions:
healthcheckv2:
use_v2: true
http:
endpoint: 0.0.0.0:13133
jaeger_query:
storage:
traces: badger_main
metrics: prometheus_store
ui:
config_file: /etc/jaeger/ui-config.json
jaeger_storage:
backends:
badger_main:
badger:
directories:
keys: /badger/data/keys
values: /badger/data/values
ephemeral: false
ttl:
spans: 168h
metric_backends:
prometheus_store:
prometheus:
endpoint: http://prometheus.monitoring.svc:9090
normalize_calls: true
normalize_duration: true
receivers:
otlp:
protocols:
grpc: { endpoint: 0.0.0.0:4317 }
http: { endpoint: 0.0.0.0:4318 }
processors:
batch:
send_batch_size: 10000
timeout: 5s
connectors:
spanmetrics:
dimensions:
- name: http.method
- name: http.status_code
- name: http.route
aggregation_cardinality_limit: 1500
aggregation_temporality: AGGREGATION_TEMPORALITY_CUMULATIVE
metrics_flush_interval: 15s
metrics_expiration: 5m
exporters:
jaeger_storage_exporter:
trace_storage: badger_main
prometheus:
endpoint: 0.0.0.0:8889
resource_to_telemetry_conversion:
enabled: true

The pipeline architecture has two key features:

  1. Dual-export traces pipeline: The traces pipeline fans out to both jaeger_storage_exporter (Badger storage) and the spanmetrics connector. The connector generates R.E.D. metrics from every span.
  2. Spanmetrics → Prometheus pipeline: The metrics/spanmetrics pipeline receives metrics from the connector and exports them via Prometheus exporter on port 8889. These metrics (call counts, duration histograms, error rates by service/operation) are scraped by Prometheus and queryable in Grafana.
  3. Metric backends: The metric_backends.prometheus_store config tells Jaeger’s query extension to read R.E.D. metrics from Prometheus. This powers the Monitor tab in Jaeger UI, showing service-level latency and error rate graphs. normalize_calls and normalize_duration ensure metric names match the OpenTelemetry semantic conventions.

The Deployment uses strategy: Recreate since Badger uses file locking and cannot run multiple instances:

spec:
replicas: 1
strategy:
type: Recreate
template:
spec:
containers:
- name: jaeger
image: docker.io/jaegertracing/jaeger:2.15.1
args: [--config, /etc/jaeger/config.yaml]
ports:
- name: otlp-grpc
containerPort: 4317
- name: otlp-http
containerPort: 4318
- name: query-http
containerPort: 16686
- name: metrics
containerPort: 8888
- name: health
containerPort: 13133
resources:
requests:
cpu: 250m
memory: 512Mi
limits:
cpu: 1000m
memory: 2Gi

Loki’s datasource config includes derivedFields that extract trace IDs from log lines and link them to Jaeger:

# In grafana/datasources.yaml
- name: Loki
type: loki
uid: loki
url: http://loki.monitoring.svc:3100
jsonData:
derivedFields:
- datasourceUid: jaeger
matcherRegex: '"traceID":"(\w+)"'
name: traceID
url: "$${__value.raw}"

When a log line contains a traceID field, Grafana renders it as a clickable link that opens the trace in Jaeger.


Grafana uses Authentik as an OAuth2/OIDC provider:

grafana.ini
[auth]
oauth_allow_insecure_email_lookup = true
[auth.generic_oauth]
enabled = true
name = Authentik
allow_sign_up = true
auto_login = false
scopes = openid email profile
auth_url = https://authentik.example.io/application/o/authorize/
token_url = https://authentik.example.io/application/o/token/
api_url = https://authentik.example.io/application/o/userinfo/
signout_redirect_url = https://authentik.example.io/application/o/grafana/end-session/
role_attribute_path = contains(groups, 'Grafana Admins') && 'Admin' || contains(groups, 'Grafana Editors') && 'Editor' || 'Viewer'
groups_attribute_path = groups
login_attribute_path = preferred_username
name_attribute_path = name
email_attribute_path = email
use_pkce = true
use_refresh_token = true

Role mapping via Authentik groups:

Authentik GroupGrafana Role
Grafana AdminsAdmin
Grafana EditorsEditor
(everyone else)Viewer

Credentials (oauth-client-id, oauth-client-secret) are stored in grafana-secret and injected as env vars. The secret must be SOPS-encrypted.

Four datasources are provisioned via a directly-mounted ConfigMap (not the sidecar):

datasources:
- name: Prometheus
type: prometheus
uid: prometheus
url: http://prometheus.monitoring.svc:9090
isDefault: true
jsonData:
httpMethod: POST
timeInterval: 30s
- name: Alertmanager
type: alertmanager
uid: alertmanager
url: http://alertmanager.monitoring.svc:9093
jsonData:
implementation: prometheus
- name: Loki
type: loki
uid: loki
url: http://loki.monitoring.svc:3100
jsonData:
derivedFields:
- datasourceUid: jaeger
matcherRegex: '"traceID":"(\w+)"'
name: traceID
url: "$${__value.raw}"
- name: Jaeger
type: jaeger
uid: jaeger
url: http://jaeger-query.monitoring.svc:16686

The Grafana Deployment has two containers: the k8s-sidecar for dashboard provisioning and Grafana itself:

containers:
- name: grafana-sc-dashboard
image: quay.io/kiwigrid/k8s-sidecar:2.5.0
env:
- name: LABEL
value: grafana_dashboard
- name: LABEL_VALUE
value: "1"
- name: METHOD
value: WATCH
- name: FOLDER
value: /tmp/dashboards
- name: NAMESPACE
value: ALL
- name: RESOURCE
value: configmap
resources:
requests:
cpu: 50m
memory: 64Mi
- name: grafana
image: docker.io/grafana/grafana:12.3.3
env:
- name: GF_SECURITY_ADMIN_USER
valueFrom:
secretKeyRef:
name: grafana-secret
key: admin-user
- name: GF_SECURITY_ADMIN_PASSWORD
valueFrom:
secretKeyRef:
name: grafana-secret
key: admin-password
- name: GF_AUTH_GENERIC_OAUTH_CLIENT_ID
valueFrom:
secretKeyRef:
name: grafana-secret
key: oauth-client-id
- name: GF_AUTH_GENERIC_OAUTH_CLIENT_SECRET
valueFrom:
secretKeyRef:
name: grafana-secret
key: oauth-client-secret
resources:
requests:
cpu: 100m
memory: 128Mi

The sidecar in WATCH mode detects ConfigMaps with grafana_dashboard: "1" across all namespaces and writes them to /tmp/dashboards. Grafana’s dashboard provider reads from that directory.

All 16 dashboards are standalone .json files managed by kustomize configMapGenerator:

monitoring/grafana/dashboards/kustomization.yaml
generatorOptions:
disableNameSuffixHash: true
labels:
grafana_dashboard: "1"
configMapGenerator:
- name: alertmanager-dashboard
files:
- alertmanager.json
- name: cloudflare-logpush-dashboard
files:
- cloudflare-logpush.json
# ... 14 more entries (16 total)

This replaced the previous approach of inlining dashboard JSON inside YAML ConfigMaps. The benefits:

  • JSON files get proper syntax highlighting in editors
  • No YAML escaping issues with special characters in JSON
  • Files can be imported/exported directly from Grafana’s UI
  • Easy to diff and review in git
DashboardSourcePanels
Alertmanagergrafana.com~6
Alloygrafana.com~30
Authentikgrafana.com~20
Blackbox Exportergrafana.com~12
Cloudflare Logpushcustom gen script135
Cloudflare Tunnelcustom gen script67
CoreDNSgrafana.com~15
Grafana Statsgrafana.com~8
Jaegergrafana.com~20
K8s Clustergrafana.com~15
Lokigrafana.com~40
Node Exportergrafana.com~40
Prometheusgrafana.com~35
Securitycustom~20
Traefikgrafana.com~25
Traefik Access Logscustom~15

Adding upstream dashboards from grafana.com:

Terminal window
cd monitoring/grafana/dashboards/
./add-dashboard.sh <gnet-id> <name> [revision]
# Example:
./add-dashboard.sh 1860 node-exporter 37

The script downloads the JSON, replaces all datasource template variables with hardcoded UIDs (prometheus, loki), strips __inputs/__requires, fixes deprecated panel types (grafana-piechart-panel -> piechart), writes a standalone .json file, and adds a configMapGenerator entry.

Regenerating custom dashboards:

Terminal window
python3 gen-cloudflare-logpush.py # 135 panels (123 content + 12 section rows)
python3 gen-cloudflare-logpush.py --export # Portable export for grafana.com sharing
python3 gen-cloudflared.py # 67 panels (58 content + 9 section rows)
python3 gen-cloudflared.py --export # Portable export for grafana.com sharing

Both generators support --export which replaces hardcoded datasource UIDs with template variables (${DS_LOKI}, ${DS_PROMETHEUS}) and adds __inputs/__requires arrays for Grafana.com compatibility. Export files are written with an -export suffix.

Custom dashboards are generated by Python scripts rather than hand-edited JSON. A 125-panel dashboard is thousands of lines of JSON but only ~1200 lines of Python with reusable helper functions:

Both generators share the same architecture: helper functions produce Grafana panel dictionaries, which are assembled into a dashboard JSON and written to disk. All helpers accept a desc="" parameter for panel descriptions (shown as tooltips in Grafana).

gen-cloudflare-logpush.py (Loki datasource):

#!/usr/bin/env python3
"""Generate the Cloudflare Logpush Grafana dashboard JSON."""
import json, sys
from country_codes import COUNTRY_NAMES # 249 ISO 3166-1 Alpha-2 entries
EXPORT = "--export" in sys.argv
DS = {"type": "loki", "uid": "${DS_LOKI}"} if EXPORT else {"type": "loki", "uid": "loki"}
# Helper functions - all accept desc="" for panel descriptions
def stat_panel(id, title, expr, legend, x, y, w=6, unit="short",
thresholds=None, instant=True, desc=""): ...
def ts_panel(id, title, targets, x, y, w=12, h=8, unit="short",
stack=True, overrides=None, fill=20, legend_calcs=None, desc=""): ...
def table_panel(id, title, expr, legend, x, y, w=8, h=8,
extra_overrides=None, desc=""): ...
def pie_panel(id, title, expr, legend, x, y, w=6, h=8,
overrides=None, desc=""): ... # legend.placement: "right" for 10+ slices
def bar_panel(id, title, targets, x, y, w=12, h=8, unit="short",
stack=True, overrides=None, desc=""): ...
def geomap_panel(id, title, expr, lookup_field, x, y, w=16, h=10, desc=""): ...
# Selective JSON parsing - only extract fields each query needs
def http(*fields):
"""Build LogQL selector for http_requests with template variable filters."""
# Always includes _HTTP_FILTER_FIELDS (ClientRequestHost, ClientCountry,
# ClientRequestPath, ClientIP, JA4, ClientASN, EdgeColoCode) for filtering
def fw(*fields):
"""Build LogQL selector for firewall_events."""
def wk(*fields):
"""Build LogQL selector for workers_trace_events."""
# Override helpers for human-readable labels
def country_name_overrides(): ... # ISO Alpha-2 → country name
def country_value_mappings_override(column_name): ...

gen-cloudflared.py (Prometheus datasource):

#!/usr/bin/env python3
"""Generate the Cloudflare Tunnel (cloudflared) Grafana dashboard JSON."""
import json, os
DS = {"type": "prometheus", "uid": "prometheus"}
# Same helper pattern, plus cloudflared-specific panel types
def stat_panel(id, title, expr, legend, x, y, w=6, unit="short",
thresholds=None, decimals=None, desc="", mappings=None): ...
def ts_panel(id, title, targets, x, y, w=12, h=8, unit="short",
stack=False, overrides=None, fill=20, desc="", legend_calcs=None): ...
def gauge_panel(id, title, expr, legend, x, y, w=6, h=6,
unit="percent", thresholds=None, desc="", min_val=0, max_val=100): ...
def table_panel(id, title, expr, legend, x, y, w=12, h=8, desc=""): ...
def text_panel(id, content, x, y, w=24, h=4, title="", desc=""): ...

Key design difference: the Logpush generator uses selective JSON parsing (| json field1, field2 instead of full | json) because Logpush events have ~72 fields. Each query extracts only the fields it needs, plus filter fields for template variable support. The cloudflared generator uses standard PromQL since Prometheus metrics are already structured.

The ts_panel legend_calcs parameter controls which calculations appear in the legend footer. Default is ["sum", "mean"] for Logpush (count-based) and ["mean", "max"] for cloudflared (gauge-based). Ratio panels and timing panels override this to ["mean", "lastNotNull"].


This is the most complex part of the stack. Cloudflare Logpush pushes HTTP request logs, firewall events, and Workers trace events as gzip-compressed NDJSON to an HTTPS endpoint on the cluster. The challenge: Alloy’s /loki/api/v1/raw endpoint does not handle gzip, and Traefik has no built-in request body decompression.

When Cloudflare Logpush sends data to an HTTP destination:

  1. Logpush always gzip-compresses HTTP payloads — no way to disable this
  2. Alloy’s loki.source.api /loki/api/v1/raw does not handle Content-Encoding: gzip — confirmed by reading Alloy source. Only /loki/api/v1/push (protobuf/JSON) handles gzip
  3. Traefik’s compress middleware only handles response compression, not request body decompression

This means a decompression layer is needed between Cloudflare and Alloy.

I wrote a Traefik Yaegi (Go interpreter) local plugin that intercepts Content-Encoding: gzip requests, decompresses the body, and passes through to the next handler:

package decompress
import (
"bytes"
"compress/gzip"
"context"
"fmt"
"io"
"net/http"
"strconv"
"strings"
)
type Config struct{}
func CreateConfig() *Config { return &Config{} }
type Decompress struct {
next http.Handler
name string
}
func New(ctx context.Context, next http.Handler, config *Config,
name string) (http.Handler, error) {
return &Decompress{next: next, name: name}, nil
}
func (d *Decompress) ServeHTTP(rw http.ResponseWriter, req *http.Request) {
encoding := strings.ToLower(req.Header.Get("Content-Encoding"))
if encoding != "gzip" {
d.next.ServeHTTP(rw, req)
return
}
gzReader, err := gzip.NewReader(req.Body)
if err != nil {
http.Error(rw, fmt.Sprintf("failed to create gzip reader: %v", err),
http.StatusBadRequest)
return
}
defer gzReader.Close()
decompressed, err := io.ReadAll(gzReader)
if err != nil {
http.Error(rw, fmt.Sprintf("failed to decompress body: %v", err),
http.StatusBadRequest)
return
}
req.Body = io.NopCloser(bytes.NewReader(decompressed))
req.ContentLength = int64(len(decompressed))
req.Header.Set("Content-Length", strconv.Itoa(len(decompressed)))
req.Header.Del("Content-Encoding")
d.next.ServeHTTP(rw, req)
}

Published at github.com/erfianugrah/decompress, tagged v0.1.0.

Traefik loads local plugins from /plugins-local/src/<moduleName>/. Since Traefik runs with readOnlyRootFilesystem: true, the plugin files are packaged as a ConfigMap and mounted:

Step 1: ConfigMap in traefik namespace containing decompress.go, go.mod, .traefik.yml:

apiVersion: v1
kind: ConfigMap
metadata:
name: traefik-plugin-decompress
namespace: traefik
data:
decompress.go: |
package decompress
// ... (full Go source)
go.mod: |
module github.com/erfianugrah/decompress
go 1.22
.traefik.yml: |
displayName: Decompress Request Body
type: middleware
import: github.com/erfianugrah/decompress
summary: Decompresses gzip-encoded request bodies for upstream services.
testData: {}

Step 2: Volume mount in Traefik Deployment:

volumeMounts:
- name: plugin-decompress
mountPath: /plugins-local/src/github.com/erfianugrah/decompress
readOnly: true
volumes:
- name: plugin-decompress
configMap:
name: traefik-plugin-decompress

Step 3: Traefik arg to enable the plugin:

args:
- "--experimental.localPlugins.decompress.moduleName=github.com/erfianugrah/decompress"

Step 4: Middleware CRD (must be in same namespace as IngressRoute):

apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: decompress
namespace: monitoring
spec:
plugin:
decompress: {}

The Alloy Logpush receiver runs as a separate Deployment. The key design: it knows nothing about individual Logpush datasets. Each job injects a _dataset field via output_options.record_prefix, and Alloy extracts only that as a label:

loki.source.api "cloudflare" {
http {
listen_address = "0.0.0.0"
listen_port = 3500
}
labels = {
job = "cloudflare-logpush",
}
forward_to = [loki.process.cloudflare.receiver]
}
loki.process "cloudflare" {
stage.json {
expressions = { dataset = "_dataset" }
}
stage.labels {
values = { dataset = "dataset" }
}
forward_to = [loki.write.default.receiver]
}
loki.write "default" {
endpoint {
url = "http://loki.monitoring.svc.cluster.local:3100/loki/api/v1/push"
}
}

Adding a new Logpush dataset requires zero Alloy changes — just create the job with the right record_prefix and data flows automatically.

The Logpush endpoint needs a public HTTPS URL. This is provided by the Cloudflare Tunnel:

cloudflare-tunnel-tf/records.tf
resource "cloudflare_record" "logpush-k3s" {
zone_id = var.cloudflare_secondary_zone_id
name = "logpush-k3s"
type = "CNAME"
content = cloudflare_zero_trust_tunnel_cloudflared.k3s.cname
proxied = true
tags = ["k3s", "monitoring"]
}
# cloudflare-tunnel-tf/tunnel_config.tf
ingress_rule {
hostname = "logpush-k3s.${var.secondary_domain_name}"
service = "https://traefik.traefik.svc.cluster.local"
origin_request {
origin_server_name = "logpush-k3s.${var.secondary_domain_name}"
http2_origin = true
no_tls_verify = true
}
}

The IngressRoute ties hostname, middleware, and backend together:

apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: alloy-logpush
namespace: monitoring
spec:
entryPoints:
- websecure
routes:
- kind: Rule
match: Host(`logpush-k3s.example.io`)
middlewares:
- name: decompress
namespace: monitoring
services:
- kind: Service
name: alloy-logpush
port: 3500

Seven Logpush jobs are managed in OpenTofu. Shared config uses locals:

cloudflare-tf/main_zone/locals.tf
logpush_loki_dest = "https://logpush-k3s.example.io/loki/api/v1/raw?header_Content-Type=application%2Fjson&header_X-Logpush-Secret=${var.logpush_secret}"
zone_ids = {
example_com = var.cloudflare_zone_id
example_dev = var.secondary_cloudflare_zone_id
example_io = var.thirdary_cloudflare_zone_id
}

The destination URL uses Logpush’s header_ query parameter syntax to inject Content-Type and a shared secret header.

HTTP requests (one per zone, using for_each):

resource "cloudflare_logpush_job" "http_loki" {
for_each = local.zone_ids
dataset = "http_requests"
destination_conf = local.logpush_loki_dest
enabled = true
max_upload_interval_seconds = 30
output_options {
output_type = "ndjson"
record_prefix = "{\"_dataset\":\"http_requests\","
field_names = local.http_requests_fields
timestamp_format = "rfc3339"
cve20214428 = false
}
zone_id = each.value
}

Firewall events (same pattern, for_each over zones):

resource "cloudflare_logpush_job" "firewall_loki" {
for_each = local.zone_ids
dataset = "firewall_events"
destination_conf = local.logpush_loki_dest
enabled = true
output_options {
output_type = "ndjson"
record_prefix = "{\"_dataset\":\"firewall_events\","
field_names = local.firewall_events_fields
}
zone_id = each.value
}

Workers trace events (account-scoped, single job):

resource "cloudflare_logpush_job" "workers_loki" {
dataset = "workers_trace_events"
destination_conf = local.logpush_loki_dest
enabled = true
output_options {
output_type = "ndjson"
record_prefix = "{\"_dataset\":\"workers_trace_events\","
field_names = local.workers_trace_events_fields
}
account_id = var.cloudflare_account_id
}

The record_prefix trick prepends {"_dataset":"http_requests", to every JSON line, producing:

{"_dataset":"http_requests","ClientIP":"1.2.3.4","RayID":"abc123",...}

Alloy extracts _dataset as a label; everything else stays in the log line for LogQL | json.

DatasetScopeJobsZones
http_requestsZone3example.com, example.dev, example.io
firewall_eventsZone3example.com, example.dev, example.io
workers_trace_eventsAccount1(all Workers)
Total7

The custom dashboard has 135 panels (123 content + 12 section rows) across 12 sections, generated by gen-cloudflare-logpush.py. Every panel has a description tooltip explaining what it shows and how to interpret it. Published on Grafana.com as dashboard 24873.

SectionPanelsKey visualizations
Overview8 statsRequest count, 5xx error rate, cache hit ratio, WAF attacks, bot traffic %, leaked credentials, JS detection pass rate, content scan rate
HTTP Requests22By host/status/method/protocol, top paths, suspicious user agents (BotScore < 30), top IPs, top ASNs, top countries, JA4 fingerprints, edge colos, device types, geomap, request lifecycle breakdown (client-edge-origin latency buckets)
Performance13Edge TTFB (avg/p95/p99 by host), origin timing breakdown (DNS/TCP/TLS/request/response as stacked area), client-edge RTT, request lifecycle (edge processing vs origin vs client), timing heatmaps
Cache Performance11Cache status distribution, hit ratio trend, tiered cache fill, cache status by host, cacheable vs uncacheable, compression ratio, content types by cache status
Security & Firewall13Firewall events by action/source/host/rule, top rules, firewall event timeline, top blocked IPs/paths/countries
API & Rate Limiting9API classification breakdown, API-matched vs unmatched, rate limit actions, API requests by host/method
WAF Attack Analysis6Attack score buckets (0-20 is attack), SQLi/XSS/RCE score breakdown, unmitigated attacks (high score + no action), attack source countries
Threat Intelligence9Leaked credential pairs, IP classification (Tor/VPN/botnet), geo anomaly on sensitive paths (login/admin/api), client IP reputation, threat score distribution
Bot Analysis8Bot score distribution, bot detection IDs (33 mapped IDs), JA4/JA3 fingerprints, verified bot categories, bot score vs WAF action correlation, JS detection results
Request Rate Analysis7Request rate by path (topk for timeseries), top paths by count (count_over_time for tables), rate by status, rate by host
Request & Response Size6Per-host bandwidth panels (CF→Eyeball charged, Origin→CF informational), request/response body size distributions
Workers9CPU/wall time by script (p50/p95/p99), outcomes (ok/exception/exceeded), subrequest count, execution duration heatmap, wall time breakdown

Selective JSON parsing: Each LogQL query uses | json field1, field2 to extract only the fields it needs instead of parsing all ~72 Logpush fields. A set of filter fields (ClientRequestHost, ClientCountry, ClientRequestPath, ClientIP, JA4, ClientASN, EdgeColoCode) is always included by the http() helper to support template variable filtering across all panels.

High-cardinality aggregation: Tables and “top N” panels use approx_topk (Loki 3.3+) instead of topk for probabilistic aggregation via count-min sketch. This requires query_range.shard_aggregations: approx_topk and frontend.encoding: protobuf in Loki config.

ASN and country name resolution: Raw ASN numbers and ISO Alpha-2 country codes are mapped to human-readable names using Grafana value mappings. country_codes.py has 249 entries (all ISO 3166-1 countries). ASN names are resolved live from the ClientASNDescription field in Cloudflare’s firewall_events dataset — no static ASN lookup table needed. The firewall events dataset includes the ISP/organization name for every ASN, which the dashboard queries directly via LogQL.

Template variables are textbox type with .* default (matches everything). Grafana’s label_values() only works for indexed Loki labels, not JSON-extracted fields — since all fields are in the JSON body, textbox is the only practical option. Available filters: Host, Country, Path, Client IP, JA4 fingerprint, ASN, Edge Colo.

The cloudflared dashboard has 67 panels (58 content + 9 section rows) across 9 sections, generated by gen-cloudflared.py. It covers tunnel health, capacity planning, QUIC transport internals, latency analysis, and process resource monitoring — all from cloudflared’s native Prometheus metrics endpoint. Published on Grafana.com as dashboard 24874.

SectionPanelsKey visualizations
Tunnel Overview12 stat panelsRequests/sec, error rate %, HA connections, concurrent requests, stream errors/sec, version, config version, registrations, TCP/UDP sessions, heartbeat retries, total requests
Tunnel Capacity & Scaling7Two-tier model: HTTP (concurrent requests gauge, req/s gauge, throughput timeseries) + WARP/private network (TCP/UDP port capacity gauges, port capacity % over time). Scaling guidelines text panel with limitations warning
Traffic4Requests/sec with errors overlay, response status codes (color-coded 2xx/3xx/4xx/5xx), error rate % trend, stacked response codes
Connections & Sessions8HA connections per pod, concurrent requests per tunnel, TCP sessions (active gauge + new/sec rate), UDP sessions, proxy stream errors, heartbeat retries, ICMP traffic, tunnel registrations
Edge Locations2Active edge server locations table (conn_id → edge PoP mapping), config version over time
QUIC Transport9RTT to edge (smoothed/min/latest per connection), congestion window bytes, bytes sent/received (aggregate + per-connection), packet loss by reason, congestion state with value mappings (0=SlowStart, 1=CongestionAvoidance, 2=Recovery, 3=ApplicationLimited), MTU/max payload, QUIC frames sent/received by type
Latency4Proxy connect latency (p50/p95/p99 histogram quantiles), RPC client latency, RPC server latency, proxy connect latency heatmap
RPC Operations2RPC client operations by handler/method, RPC server operations by handler/method
Process Resources8CPU usage, memory (RSS/Go heap/idle spans), network I/O (TX/RX bytes/sec), goroutines, open file descriptors vs limit, GC duration, heap objects, memory allocation rate

Two-tier capacity model: The Capacity & Scaling section separates HTTP and WARP/private network traffic because they have fundamentally different scaling characteristics:

  • HTTP-only tunnels: Requests are multiplexed over QUIC streams on 4 HA connections. No host ephemeral ports are consumed. Primary metrics: cloudflared_tunnel_concurrent_requests_per_tunnel and rate(cloudflared_tunnel_total_requests).
  • WARP/private network tunnels: TCP/UDP sessions consume host ephemeral ports. Cloudflare’s sizing calculator applies: TCP capacity = sessions/sec ÷ available_ports, UDP capacity = sessions/sec × dns_timeout ÷ available_ports.

TCP/UDP session metrics read 0 for HTTP-only tunnels — this is correct, not a bug.

Scaling limitations (documented in the dashboard’s guidelines text panel):

  • cloudflared has no auto-scaling capability — replicas are HA only, not load-balanced
  • Scaling down breaks active eyeball connections (no graceful drain)
  • For true horizontal scaling, use multiple discrete tunnels behind a load balancer

QUIC transport: cloudflared connects to Cloudflare edge via QUIC with 4 HA connections per replica. The QUIC section surfaces connection-level metrics that are otherwise invisible: RTT per connection (smoothed EWMA used by congestion control, minimum floor, latest sample), congestion state transitions, packet loss reasons, and frame-level counters. State 3 (ApplicationLimited) is normal for low-traffic tunnels.

Metric discovery note: cloudflared_tunnel_active_streams appears in Cloudflare’s documentation but is not emitted by cloudflared 2026.2.0. The dashboard uses cloudflared_proxy_connect_streams_errors for stream error tracking instead.

Template variables:

VariableTypeDefaultPurpose
jobquerycloudflared-metricsAuto-discovered from cloudflared_tunnel_ha_connections
available_portscustom50000Ephemeral ports per host (50000/30000/16384) for WARP capacity gauges
dns_timeoutcustom5DNS UDP session timeout (5/10/30 sec) for UDP capacity calculation

Working with high-cardinality Cloudflare Logpush data in Loki exposed twelve specific traps. These cost real debugging time — the error messages are often unhelpful.

1. count_over_time without sum() explodes series

Section titled “1. count_over_time without sum() explodes series”
# BAD: one series per unique log line
count_over_time({job="cloudflare-logpush"} | json [5m])
# GOOD: single aggregated count
sum(count_over_time({job="cloudflare-logpush"} | json [5m]))

After | json, every extracted field becomes a potential label. Without sum(), count_over_time returns one series per unique label combination — easily hitting max_query_series.

2. unwrap aggregations don’t support by ()

Section titled “2. unwrap aggregations don’t support by ()”
# BAD: parse error
avg_over_time(... | unwrap EdgeTimeToFirstByteMs [5m]) by (Host)
# GOOD: outer aggregation for grouping
sum by (Host) (avg_over_time(... | unwrap EdgeTimeToFirstByteMs [5m]))

Without instant: true, Loki returns a range result. The stat panel picks lastNotNull which may not reflect the full window. Set "queryType": "instant", "instant": true on stat panel targets.

  • Time series panels: [$__auto] — adapts to the visible time range
  • Table panels: [5m] fixed — $__auto creates too many evaluation windows
  • Stat panels: [5m] with instant: true
# IMPOSSIBLE: compare two extracted fields
{...} | json | OriginResponseStatus != EdgeResponseStatus

LogQL can only compare extracted fields to literal values. Use two queries or dashboard transformations.

Always wrap unwrap aggregations in an outer sum() or avg by ():

# BAD: one series per label combination
avg_over_time(... | unwrap EdgeTimeToFirstByteMs [$__auto])
# GOOD: collapsed
sum(avg_over_time(... | unwrap EdgeTimeToFirstByteMs [$__auto]))

7. max_query_series applies to inner cardinality

Section titled “7. max_query_series applies to inner cardinality”
topk(10, sum by (Path) (count_over_time(... | json [5m])))

Loki evaluates sum by (Path) first. If there are thousands of unique paths (bots/scanners), it exceeds max_query_series before topk ever runs. Reducing the time window does not help — the cardinality is inherent in the data.

8. High-cardinality topk requires high max_query_series

Section titled “8. High-cardinality topk requires high max_query_series”

Even a 1-second scan window can have 1500+ unique paths due to bots. Raised max_query_series to 5000:

limits_config:
max_query_series: 5000

The memory impact on single-instance homelab Loki is negligible for instant queries.

9. Table panels with [$__auto] hit series limits

Section titled “9. Table panels with [$__auto] hit series limits”

Combines pitfalls 4, 7, and 8. Over a 24h range, $__auto might resolve to 15-second intervals, creating many evaluation windows. Use [5m] fixed for all table instant queries.

10. approx_topk solves the topk cardinality problem

Section titled “10. approx_topk solves the topk cardinality problem”

Loki 3.3 added approx_topk — a probabilistic alternative to topk that uses a count-min sketch instead of materializing all inner series:

# Instead of:
topk(10, sum by (ClientRequestPath) (count_over_time(... | json ClientRequestPath [$__auto])))
# Use:
approx_topk(10, sum by (ClientRequestPath) (count_over_time(... | json ClientRequestPath [$__auto])))

This avoids hitting max_query_series on high-cardinality fields. Requires two config settings in Loki:

query_range:
shard_aggregations: approx_topk # string, NOT a YAML list
frontend:
encoding: protobuf # required for approx_topk

Drop-in replacement for topk on instant queries (table panels). Results are approximate but accurate enough for dashboard “top N” panels.

11. Selective | json reduces query cost dramatically

Section titled “11. Selective | json reduces query cost dramatically”

Full | json extracts all ~72 Logpush fields as labels for every log line. Most queries only need 2-3 fields:

# BAD: extracts all 72 fields
{job="cloudflare-logpush"} | json | ClientRequestHost =~ "$host"
# GOOD: extracts only needed fields
{job="cloudflare-logpush"} | json ClientRequestHost, EdgeResponseStatus
| ClientRequestHost =~ "$host"

The Logpush generator’s http(), fw(), and wk() helpers automatically include template variable filter fields plus whatever fields the specific query needs. This reduced query latency significantly on the homelab Loki instance.

12. Derived metric subtraction requires per-line computation and data quality filtering

Section titled “12. Derived metric subtraction requires per-line computation and data quality filtering”

Computing “Edge Processing = TTFB - Origin Duration” has two layered pitfalls:

Problem 1 - Aggregation ordering: Subtracting two independently aggregated unwrap queries is wrong because each operates on a potentially different sample population (cache hits vs origin-fetched requests). For percentiles it’s also mathematically invalid: p99(A) - p99(B) ≠ p99(A - B).

# BAD: subtracts two independent aggregations
sum(avg_over_time(... | unwrap EdgeTimeToFirstByteMs [$__auto]))
- sum(avg_over_time(... | unwrap OriginResponseDurationMs [$__auto]))

Problem 2 - EdgeTimeToFirstByteMs is capped at 65535 (uint16): Cloudflare’s logging truncates this field at 2^16-1. When an origin takes longer than ~65 seconds, TTFB saturates at 65535 while OriginResponseDurationMs keeps counting (observed up to 661,156ms / ~11 minutes). The per-line subtraction then produces massively negative values (e.g., -595,621ms). This affects ~0.2% of traffic, typically DNS servers or backends with long timeouts.

Fix: Use label_format with Loki’s subf template function for per-line subtraction, AND filter out requests where TTFB hit the uint16 cap:

# GOOD: per-line subtraction with uint16 overflow filter
sum(avg_over_time(
... | json EdgeTimeToFirstByteMs, OriginResponseDurationMs
| EdgeTimeToFirstByteMs < 65535
| label_format EdgeProcessingMs="{{ subf .EdgeTimeToFirstByteMs .OriginResponseDurationMs }}"
| unwrap EdgeProcessingMs [$__auto]
))
# Percentiles now work correctly - true p99 of per-request edge processing time
sum(quantile_over_time(0.99,
... | EdgeTimeToFirstByteMs < 65535
| label_format EdgeProcessingMs="{{ subf .EdgeTimeToFirstByteMs .OriginResponseDurationMs }}"
| unwrap EdgeProcessingMs [$__auto]
))

The VyOS router runs node_exporter on port 9100 (HTTPS, self-signed cert). Initially we used the Prometheus Probe CRD, but it routes through blackbox exporter, producing only probe_* metrics — not the actual node_* metrics. VyOS never appeared in the Node Exporter dashboard.

The fix: ScrapeConfig CRD for direct scraping:

apiVersion: monitoring.coreos.com/v1alpha1
kind: ScrapeConfig
metadata:
name: vyos-nl
namespace: monitoring
labels:
release: prometheus
spec:
metricsPath: /metrics
scheme: HTTPS
tlsConfig:
insecureSkipVerify: true
staticConfigs:
- targets:
- prom-vyos.example.com
labels:
job: node-exporter
instance: prom-vyos.example.com
scrapeInterval: 30s
AspectProbe CRDScrapeConfig CRD
PathPrometheus -> blackbox -> targetPrometheus -> target (direct)
Metricsprobe_* onlyAll target metrics
Use caseEndpoint availabilityActual metric scraping

With job: node-exporter, VyOS appears in the Node Exporter dashboard alongside cluster nodes.


Two Secret files require SOPS encryption before committing:

FileContents
monitoring/grafana/secret.yamladmin-user, admin-password, oauth-client-id, oauth-client-secret
monitoring/alertmanager/secret.yamlAlertmanager config with SMTP credentials
Terminal window
sops --encrypt --age <YOUR_AGE_PUBLIC_KEY> \
--encrypted-regex '^(data|stringData)$' \
--in-place monitoring/grafana/secret.yaml
sops --encrypt --age <YOUR_AGE_PUBLIC_KEY> \
--encrypted-regex '^(data|stringData)$' \
--in-place monitoring/alertmanager/secret.yaml

OpenTofu secrets (logpush_secret, zone IDs, API tokens) live in SOPS-encrypted secrets.tfvars:

Terminal window
sops -d secrets.tfvars > /tmp/secrets.tfvars
tofu plan -var-file=/tmp/secrets.tfvars
tofu apply -var-file=/tmp/secrets.tfvars
rm /tmp/secrets.tfvars

Terminal window
# 1. Deploy the entire monitoring stack (includes all components)
kubectl apply -k monitoring/ --server-side --force-conflicts
# 2. Deploy decompress plugin + middleware (separate from monitoring kustomization)
kubectl apply -f middleware/decompress-configmap.yaml
kubectl apply -f middleware/decompress-middleware.yaml
# 3. Deploy updated Traefik with plugin enabled
kubectl apply -f services/traefik.yaml
# 4. Deploy ingress routes
kubectl apply -f ingressroutes/grafana-ingress.yaml
kubectl apply -f ingressroutes/prometheus-ingress.yaml
kubectl apply -f ingressroutes/alertmanager-ingress.yaml
kubectl apply -f ingressroutes/jaeger-ingress.yaml
kubectl apply -f ingressroutes/alloy-logpush-ingress.yaml
# 5. Deploy KEDA autoscaling
kubectl apply -f hpa/grafana-keda-autoscaling.yaml
kubectl apply -f hpa/prom-keda-autoscaling.yaml
# 6. Apply OpenTofu for DNS + tunnel config
cd cloudflare-tunnel-tf/ && tofu apply
# 7. Apply OpenTofu for Logpush jobs
cd ../cloudflare-tf/main_zone/
tofu apply -var-file=secrets.tfvars

--server-side is required because the Prometheus Operator CRDs and Node Exporter dashboard exceed the 262144-byte annotation limit. IngressRoutes and KEDA ScaledObjects are outside the monitoring/ kustomization directory because Kustomize cannot reference files outside its root.

Terminal window
# All pods running
kubectl get pods -n monitoring
# Prometheus targets
kubectl port-forward svc/prometheus 9090 -n monitoring
# Visit http://localhost:9090/targets -- all should be UP
# Loki receiving data
kubectl logs deploy/alloy-logpush -n monitoring --tail=20
# Logpush data flowing
# In Grafana Explore with Loki:
# {job="cloudflare-logpush"} | json
# Dashboard ConfigMaps
kubectl get cm -n monitoring -l grafana_dashboard=1
# Should show 16 ConfigMaps

ComponentInstancesCPU ReqMem ReqStorage
Prometheus Operator1100m128Mi
Prometheus1200m512Mi10Gi NFS
Alertmanager150m64Mi1Gi NFS
Grafana1100m128Mi1Gi NFS
kube-state-metrics150m64Mi
Node Exporter4 (DaemonSet)50m x432Mi x4
Blackbox Exporter125m32Mi
Loki1250m512Mi20Gi NFS
Grafana Alloy4 (DaemonSet)100m x4128Mi x4
Alloy Logpush150m64Mi
Jaeger1250m512Mi10Gi NFS
Total~1.68 cores~2.59Gi~42Gi

monitoring/
kustomization.yaml # Top-level: composes all components
namespace.yaml
operator/ # Prometheus Operator v0.89.0
kustomization.yaml
crd-*.yaml # 10 CRDs (~3.7 MB)
serviceaccount.yaml
clusterrole.yaml / clusterrolebinding.yaml
deployment.yaml / service.yaml / servicemonitor.yaml
webhook.yaml # Cert-gen Jobs + webhook configs
prometheus/ # Prometheus v3.9.1 (operator-managed)
kustomization.yaml
prometheus.yaml # Prometheus CR
serviceaccount.yaml / clusterrole.yaml / clusterrolebinding.yaml
service.yaml / servicemonitor.yaml
rules/
general-rules.yaml # Watchdog, TargetDown
kubernetes-apps.yaml # CrashLoopBackOff, restarts
kubernetes-resources.yaml # CPU/memory quota
node-rules.yaml # Filesystem, memory, CPU
k8s-recording-rules.yaml # Pre-computed recording rules
traefik-rules.yaml # Traefik alerts
alertmanager/ # Alertmanager v0.31.1
alertmanager.yaml # Alertmanager CR
secret.yaml # SOPS-encrypted SMTP config
grafana/ # Grafana 12.3.3
configmap.yaml # grafana.ini + dashboard provider
datasources.yaml # Prometheus, Loki, Jaeger, Alertmanager
deployment.yaml # Grafana + k8s-sidecar
secret.yaml # SOPS-encrypted credentials
dashboards/
kustomization.yaml # configMapGenerator (16 dashboards)
add-dashboard.sh # Download from grafana.com
gen-cloudflare-logpush.py # 135-panel dashboard generator (--export for grafana.com)
gen-cloudflared.py # 67-panel dashboard generator (--export for grafana.com)
country_codes.py # 249 ISO 3166-1 Alpha-2 → country name
*.json # 16 dashboard files
loki/ # Loki 3.6.5 (monolithic)
configmap.yaml / statefulset.yaml / service.yaml / servicemonitor.yaml
alloy/ # Grafana Alloy v1.13.1 (DaemonSet)
configmap.yaml / daemonset.yaml / service.yaml / servicemonitor.yaml
alloy-logpush/ # Alloy Logpush receiver (Deployment)
configmap.yaml / deployment.yaml / service.yaml / servicemonitor.yaml
jaeger/ # Jaeger 2.15.1 (all-in-one)
configmap.yaml / deployment.yaml / service.yaml / servicemonitor.yaml
kube-state-metrics/ # v2.18.0
node-exporter/ # v1.10.2 (DaemonSet)
blackbox-exporter/ # v0.28.0
servicemonitors/ # Cross-namespace ServiceMonitors
apiserver.yaml / authentik.yaml / cloudflared.yaml
coredns.yaml / kubelet.yaml / revista.yaml / traefik.yaml
probes/
vyos-scrape.yaml # ScrapeConfig for VyOS node_exporter
middleware/ # Traefik decompress plugin
decompress-plugin/
decompress.go / go.mod / .traefik.yml
decompress-configmap.yaml # ConfigMap for k8s
decompress-middleware.yaml # Middleware CRD
ingressroutes/
grafana-ingress.yaml / prometheus-ingress.yaml
alertmanager-ingress.yaml / jaeger-ingress.yaml
alloy-logpush-ingress.yaml
hpa/
grafana-keda-autoscaling.yaml # maxReplicas: 1 (SQLite limitation)
prom-keda-autoscaling.yaml # maxReplicas: 8
cloudflare-tunnel-tf/ # OpenTofu: DNS + tunnel ingress rules
records.tf / tunnel_config.tf
cloudflare-tf/main_zone/ # OpenTofu: Logpush jobs
zone_logpush_job.tf / locals.tf / variables.tf