Traefik on k3s: Custom Deployment, Plugins, Middlewares, and Cloudflare Tunnel
A complete guide to replacing k3s’s built-in Traefik with a fully custom deployment on a 4-node ARM64 homelab cluster. The built-in Traefik is fine for simple setups, but it doesn’t support local plugins, has limited middleware configuration, and doesn’t expose the level of control needed for things like bot detection, request body decompression, or per-route rate limiting.
This guide covers the full setup: disabling the built-in Traefik, deploying a custom one as a raw Deployment manifest, writing and packaging Traefik Go plugins as ConfigMaps, configuring the middleware chain, managing TLS certificates via Cloudflare DNS challenge, routing traffic through Cloudflare Tunnel, autoscaling with KEDA, and piping access logs + traces into the monitoring stack.
Architecture Overview
Section titled “Architecture Overview”All HTTP traffic enters through Cloudflare’s edge network, passes through a Cloudflare Tunnel (cloudflared running in the cluster), and hits the custom Traefik deployment in the traefik namespace. Traefik terminates TLS (ACME certs via Cloudflare DNS challenge), runs the global middleware chain (sentinel → crowdsec-bouncer → security-headers), then routes to per-route middlewares and backend services.
Component versions
Section titled “Component versions”| Component | Version | Image |
|---|---|---|
| Traefik | v3.6.8 | traefik:v3.6.8 |
| CrowdSec Bouncer plugin | v1.5.0 | (remote, fetched by Traefik) |
| cloudflared | 2026.2.0 | cloudflare/cloudflared:2026.2.0 |
| KEDA | (cluster-wide) | (already deployed) |
Part 1: Disabling the Built-in Traefik
Section titled “Part 1: Disabling the Built-in Traefik”k3s ships with Traefik as a bundled Helm chart. It auto-deploys on the server node and manages its own CRDs. To run a custom Traefik, the built-in one must be fully disabled — otherwise you get two Traefik instances fighting over the same IngressRoutes.
Ansible playbook
Section titled “Ansible playbook”---- name: Disable k3s built-in Traefik and ServiceLB on server hosts: server become: yes tasks: - name: Add disable directives to k3s config.yaml ansible.builtin.blockinfile: path: /etc/rancher/k3s/config.yaml marker: "# {mark} ANSIBLE MANAGED - disable built-in addons" block: | disable: - traefik - servicelb create: no register: config_changed
- name: Remove k3s bundled traefik manifest files ansible.builtin.file: path: "{{ item }}" state: absent loop: - /var/lib/rancher/k3s/server/manifests/traefik.yaml - /var/lib/rancher/k3s/server/static/charts/traefik-crd-38.0.201+up38.0.2.tgz - /var/lib/rancher/k3s/server/static/charts/traefik-38.0.201+up38.0.2.tgz register: manifests_removed
- name: Restart k3s to pick up config change ansible.builtin.systemd: name: k3s state: restarted daemon_reload: yes when: config_changed.changed
- name: Wait for k3s API to be ready after restart ansible.builtin.wait_for: port: 6443 host: "{{ ansible_host }}" delay: 10 timeout: 120 when: config_changed.changedRun it:
ansible-playbook -i inventory.yml \ ansible-playbooks/my-playbooks/disable-builtin-traefik.yml \ --become --ask-become-passSafe to re-run (idempotent). The playbook also removes stale chart tarballs from k3s’s static manifests directory — without this, k3s may re-deploy the built-in Traefik on restart even with disable set.
Part 2: CRDs and RBAC
Section titled “Part 2: CRDs and RBAC”Traefik’s Kubernetes CRD provider needs its own CRD definitions (IngressRoute, Middleware, TLSOption, etc.) and RBAC permissions. These are separate from the Traefik Deployment itself and must be applied first.
# Apply Traefik CRDs (one-time, or on Traefik version upgrades)kubectl apply -f crds/kubernetes-crd-definition-v1.yml --server-sidekubectl apply -f crds/kubernetes-crd-rbac.ymlThe CRD file is large (~3.5 MB) and requires --server-side due to the annotation size limit. RBAC grants the Traefik ServiceAccount read access to IngressRoutes, Middlewares, TLSOptions, Services, Secrets, EndpointSlices, and related resources across both traefik.io and the legacy traefik.containo.us API groups.
Part 3: The Traefik Deployment
Section titled “Part 3: The Traefik Deployment”The entire Traefik deployment lives in a single manifest: services/traefik.yaml. It contains a ServiceAccount, ClusterRole, ClusterRoleBinding, LoadBalancer Service, Deployment, IngressClass, and PodDisruptionBudget.
Entrypoints
Section titled “Entrypoints”Five entrypoints handle different traffic types:
| Entrypoint | Address | Protocol | Purpose |
|---|---|---|---|
web | :8000/tcp | HTTP | Redirect to HTTPS (unused behind tunnel) |
websecure | :8443 | HTTPS + HTTP/3 + QUIC | All production traffic |
metrics | :8082/tcp | HTTP | Prometheus metrics scrape endpoint |
traefik | :9000/tcp | HTTP | Dashboard API + health checks (/ping) |
jvb-udp | :10000/udp | UDP | Jitsi Videobridge media |
The websecure entrypoint is the workhorse. Key settings:
args: - "--entrypoints.websecure.address=:8443" - "--entrypoints.websecure.http.tls=true" - "--entrypoints.websecure.http.tls.certResolver=cloudflare" - "--entrypoints.websecure.http3=true" - "--entrypoints.websecure.http3.advertisedport=443" - "--entrypoints.websecure.http2.maxConcurrentStreams=512" # Global middlewares applied to ALL websecure requests - "--entrypoints.websecure.http.middlewares=traefik-sentinel@kubernetescrd,traefik-crowdsec-bouncer@kubernetescrd,traefik-security-headers@kubernetescrd"HTTP/3 is enabled with advertisedport=443 because the container listens on 8443 but the LoadBalancer Service maps port 443 → 8443. Without the advertised port, clients would try QUIC on port 8443 and fail.
Timeouts
Section titled “Timeouts”args: - "--entrypoints.websecure.transport.respondingTimeouts.readTimeout=60s" - "--entrypoints.websecure.transport.respondingTimeouts.writeTimeout=0s" - "--entrypoints.websecure.transport.respondingTimeouts.idleTimeout=180s" - "--entrypoints.websecure.transport.lifeCycle.graceTimeOut=30s" - "--entrypoints.websecure.transport.lifeCycle.requestAcceptGraceTimeout=5s"writeTimeout=0s (disabled) is intentional. Matrix (Synapse), Jitsi, and LiveKit all use long-lived WebSocket connections. A non-zero write timeout would kill WebSocket connections that don’t send data within the timeout window. The tradeoff is that slowloris-style attacks against WebSocket endpoints aren’t mitigated at the Traefik layer — but CrowdSec and Cloudflare’s DDoS protection handle that upstream.
Forwarded headers
Section titled “Forwarded headers”args: - "--entrypoints.websecure.forwardedHeaders.trustedIPs=173.245.48.0/20,103.21.244.0/22,..."All Cloudflare IPv4 and IPv6 ranges are listed as trusted IPs. This tells Traefik to trust X-Forwarded-For headers from these IPs, which is necessary because Cloudflare Tunnel connects from Cloudflare edge IPs. Without this, X-Forwarded-For would be stripped and the sentinel plugin would see the cloudflared pod IP instead of the real client IP.
Go runtime tuning
Section titled “Go runtime tuning”env: - name: GOMAXPROCS value: "2" - name: GOMEMLIMIT value: "900MiB"On ARM64 homelab nodes with 4 cores, limiting GOMAXPROCS to 2 prevents Traefik from consuming all CPU cores. GOMEMLIMIT at 900MiB (with a 1024Mi limit) gives the Go GC a soft target to aim for, reducing OOM kills from GC pressure spikes.
Security context
Section titled “Security context”securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL readOnlyRootFilesystem: trueThe root filesystem is read-only. Writable paths are provided via volume mounts: /ssl-certs-2 (PVC for ACME certs), /tmp (emptyDir), /plugins-local/ (ConfigMap mounts for plugins), /plugins-storage (emptyDir for remote plugin cache).
Pod anti-affinity and PDB
Section titled “Pod anti-affinity and PDB”affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchLabels: app.kubernetes.io/name: traefik topologyKey: kubernetes.io/hostnameWith 2 replicas, the anti-affinity preference spreads them across different nodes. It’s preferred not required because on a 4-node cluster with other workloads, there might not always be two nodes available.
The PDB ensures at least 1 replica is always available during voluntary disruptions (node drains, rolling updates):
apiVersion: policy/v1kind: PodDisruptionBudgetmetadata: name: traefik-pdb namespace: traefikspec: minAvailable: 1Graceful shutdown
Section titled “Graceful shutdown”lifecycle: preStop: exec: command: ["sh", "-c", "sleep 10"]The 10-second pre-stop sleep gives the Service endpoints time to de-register from kube-proxy before the pod starts shutting down. Without this, in-flight requests can hit a pod that’s already draining.
Part 4: Custom Local Plugins
Section titled “Part 4: Custom Local Plugins”Traefik supports two types of plugins: remote (fetched from GitHub on startup) and local (mounted from the filesystem). Local plugins use Traefik’s Yaegi Go interpreter — you write standard Go code, and Traefik interprets it at runtime. No compilation step needed.
How local plugins work
Section titled “How local plugins work”- Plugin source goes into
/plugins-local/src/<moduleName>/inside the Traefik container - The module must have
go.mod,.traefik.yml, and the Go source file - Traefik is told about the plugin via
--experimental.localPlugins.<name>.moduleName=<moduleName> - A Middleware CRD references the plugin by name under
spec.plugin.<name>
Since Traefik runs with readOnlyRootFilesystem: true, the plugin files are packaged as ConfigMaps and mounted as volumes.
Plugin 1: Sentinel (bot detection + IP resolution)
Section titled “Plugin 1: Sentinel (bot detection + IP resolution)”Sentinel is a custom plugin that replaces the standalone realclientip plugin. It does two things: resolves the real client IP from trusted headers (Cloudflare’s Cf-Connecting-Ip or X-Forwarded-For with proxy skipping), and runs heuristic bot detection that scores each request.
Request flow:
- Resolve real client IP from trusted headers (
Cf-Connecting-Ip> XFF right-to-left > RemoteAddr) - Set
X-Real-Client-Ipheader (used by rate limiters and Loki analytics) - Score request using 9 heuristic signals
- Set
X-Bot-Scoreheader (always, regardless of score — this feeds the Grafana security dashboard via Loki) - If score >=
blockThreshold(100): return 403 Forbidden - Otherwise: pass to next middleware
Scoring signals:
| Signal | Score | Rationale |
|---|---|---|
| Scanner UA substring match | +100 | sqlmap, nikto, nuclei, zgrab, etc. — one match is enough to block |
| Honeypot path match | +100 | /.env, /.git/HEAD, /wp-login.php, etc. — no legitimate client requests these |
| Empty User-Agent | +40 | Most real browsers always send UA |
Missing Accept header | +30 | Browsers always send Accept |
| HTTP/1.0 protocol | +25 | Almost no modern client uses HTTP/1.0 |
Missing Accept-Language | +20 | Browsers send this; most bots don’t |
Missing Accept-Encoding | +15 | Browsers send this |
Connection: close with HTTP/1.1 | +10 | Unusual for real clients |
| Per-IP rate exceeded (>30 req/s) | +30 | Sliding window rate tracker per IP |
A request with a known scanner UA (+100) gets blocked immediately. A request with no UA (+40), no Accept (+30), and no Accept-Language (+20) also gets blocked (90 total, but add missing Accept-Encoding at +15 = 105 >= 100). The per-IP rate tracker uses a sliding window with background cleanup to prevent memory leaks.
Packaging as ConfigMap:
The plugin source, go.mod, and .traefik.yml are inlined in a ConfigMap:
apiVersion: v1kind: ConfigMapmetadata: name: traefik-plugin-sentinel namespace: traefikdata: sentinel.go: | package sentinel // ... (full Go source, ~435 lines) go.mod: | module github.com/erfianugrah/sentinel go 1.22 .traefik.yml: | displayName: Sentinel type: middleware import: github.com/erfianugrah/sentinel summary: Real client IP resolution + heuristic bot detection with scoring. testData: trustedHeaders: - Cf-Connecting-Ip - X-Forwarded-For # ...Mounted in the Deployment:
volumeMounts: - name: plugin-sentinel mountPath: /plugins-local/src/github.com/erfianugrah/sentinel readOnly: truevolumes: - name: plugin-sentinel configMap: name: traefik-plugin-sentinelEnabled via args:
args: - "--experimental.localPlugins.sentinel.moduleName=github.com/erfianugrah/sentinel"Middleware CRD (applied as a global middleware on the websecure entrypoint):
apiVersion: traefik.io/v1alpha1kind: Middlewaremetadata: name: sentinel namespace: traefikspec: plugin: sentinel: trustedHeaders: - Cf-Connecting-Ip - X-Forwarded-For trustedProxies: - "10.42.0.0/16" # k3s pod CIDR - "10.43.0.0/16" # k3s service CIDR - "173.245.48.0/20" # Cloudflare IPv4 # ... all CF ranges enabled: true blockThreshold: 100 tagThreshold: 60 scannerUAs: "sqlmap,nikto,dirbuster,masscan,zgrab,nuclei,httpx,gobuster,ffuf,nmap,whatweb,wpscan,joomla,drupal" honeypotPaths: "/.env,/.git/HEAD,/.git/config,/wp-login.php,/wp-config.php,/wp-admin,/.aws/credentials,/actuator/env,/actuator/health,/xmlrpc.php,/.DS_Store,/config.json,/package.json,/.htaccess,/server-status,/debug/pprof" rateLimitPerSecond: 30 rateLimitWindowSeconds: 10Plugin 2: Decompress (gzip request body)
Section titled “Plugin 2: Decompress (gzip request body)”The decompress plugin exists for one reason: Cloudflare Logpush always gzip-compresses HTTP payloads, and Alloy’s /loki/api/v1/raw endpoint doesn’t handle Content-Encoding: gzip. Traefik’s built-in compress middleware only handles response compression, not request body decompression.
The plugin is simple — 71 lines of Go:
func (d *Decompress) ServeHTTP(rw http.ResponseWriter, req *http.Request) { encoding := strings.ToLower(req.Header.Get("Content-Encoding")) if encoding != "gzip" { d.next.ServeHTTP(rw, req) return }
gzReader, err := gzip.NewReader(req.Body) if err != nil { http.Error(rw, fmt.Sprintf("failed to create gzip reader: %v", err), http.StatusBadRequest) return } defer gzReader.Close()
decompressed, err := io.ReadAll(gzReader) if err != nil { http.Error(rw, fmt.Sprintf("failed to decompress body: %v", err), http.StatusBadRequest) return }
req.Body = io.NopCloser(bytes.NewReader(decompressed)) req.ContentLength = int64(len(decompressed)) req.Header.Set("Content-Length", strconv.Itoa(len(decompressed))) req.Header.Del("Content-Encoding")
d.next.ServeHTTP(rw, req)}Same ConfigMap packaging pattern as sentinel. The decompress middleware CRD lives in the monitoring namespace (same as the Alloy Logpush IngressRoute that uses it):
apiVersion: traefik.io/v1alpha1kind: Middlewaremetadata: name: decompress namespace: monitoringspec: plugin: decompress: {}Published at github.com/erfianugrah/decompress.
Part 5: Remote Plugins
Section titled “Part 5: Remote Plugins”CrowdSec Bouncer
Section titled “CrowdSec Bouncer”The CrowdSec Bouncer Traefik plugin checks each request’s IP against CrowdSec’s community threat intelligence via the LAPI (Local API). Blocked IPs get a 403.
Remote plugins are fetched by Traefik on startup from GitHub. No ConfigMap needed — just the plugin declaration in args:
args: - "--experimental.plugins.bouncer.modulename=github.com/maxlerebourg/crowdsec-bouncer-traefik-plugin" - "--experimental.plugins.bouncer.version=v1.5.0"The Middleware CRD contains the LAPI connection details and must be SOPS-encrypted because it includes the bouncer API key:
# middleware/crowdsec-bouncer.yaml (structure -- values are SOPS-encrypted)apiVersion: traefik.io/v1alpha1kind: Middlewaremetadata: name: crowdsec-bouncer namespace: traefikspec: plugin: bouncer: enabled: true crowdsecMode: stream updateIntervalSeconds: 60 defaultDecisionSeconds: 300 crowdsecLapiScheme: https crowdsecLapiHost: <your-crowdsec-lapi-endpoint> crowdsecLapiKey: <your-bouncer-api-key> forwardedHeadersTrustedIPs: - "10.42.0.0/16" - "10.43.0.0/16" clientTrustedIPs: [] forwardedHeadersCustomName: X-Real-Client-IpforwardedHeadersCustomName: X-Real-Client-Ip tells the bouncer to read the real client IP from the header set by sentinel, not from X-Forwarded-For (which might have multiple IPs).
Part 6: Global Middlewares
Section titled “Part 6: Global Middlewares”Three middlewares are applied globally to every request on the websecure entrypoint via the --entrypoints.websecure.http.middlewares flag:
- "--entrypoints.websecure.http.middlewares=traefik-sentinel@kubernetescrd,traefik-crowdsec-bouncer@kubernetescrd,traefik-security-headers@kubernetescrd"The format is <namespace>-<name>@kubernetescrd. Order matters — sentinel runs first (sets IP + bot score), then crowdsec-bouncer (checks IP reputation), then security-headers.
Security headers
Section titled “Security headers”apiVersion: traefik.io/v1alpha1kind: Middlewaremetadata: name: security-headers namespace: traefikspec: headers: stsSeconds: 63072000 # HSTS 2 years stsIncludeSubdomains: true stsPreload: true contentTypeNosniff: true referrerPolicy: "strict-origin-when-cross-origin" permissionsPolicy: "camera=(), microphone=(), geolocation=(), payment=()" customResponseHeaders: Server: "" # Strip server identity X-Powered-By: ""frameDeny, browserXssFilter, and CSP are intentionally omitted from the global middleware. These are app-specific — Authentik needs its own CSP, Grafana needs iframe support for embedding, etc. Apply those per-route where needed.
Part 7: TLS Configuration
Section titled “Part 7: TLS Configuration”TLSOption
Section titled “TLSOption”apiVersion: traefik.io/v1alpha1kind: TLSOptionmetadata: name: default namespace: defaultspec: minVersion: VersionTLS12 maxVersion: VersionTLS13 cipherSuites: # TLS 1.2 only -- TLS 1.3 ciphers are not configurable in Go (all safe by default) - TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 - TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 - TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256 - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 - TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256 - TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256 curvePreferences: - X25519 - CurveP256 sniStrict: true alpnProtocols: - h2 - http/1.1The TLSOption must be named default in the default namespace for Traefik to pick it up as the default TLS configuration. All cipher suites are AEAD-only (GCM or ChaCha20-Poly1305) — no CBC mode. sniStrict: true rejects connections that don’t present a valid SNI hostname matching a known route.
ACME via Cloudflare DNS challenge
Section titled “ACME via Cloudflare DNS challenge”args: - "--certificatesresolvers.cloudflare.acme.dnschallenge.provider=cloudflare" - "--certificatesresolvers.cloudflare.acme.email=erfi.anugrah@gmail.com" - "--certificatesresolvers.cloudflare.acme.dnschallenge.resolvers=1.1.1.1" - "--certificatesresolvers.cloudflare.acme.storage=/ssl-certs-2/acme-cloudflare.json"The CF_DNS_API_TOKEN env var is pulled from a Kubernetes Secret (cloudflare-credentials). The ACME cert storage lives on an NFS PVC (traefik-ssl-2, 2Gi, RWX) so certs survive pod restarts and don’t trigger Let’s Encrypt rate limits on every rollout.
apiVersion: v1kind: PersistentVolumeClaimmetadata: name: traefik-ssl-2 namespace: traefikspec: accessModes: [ReadWriteMany] resources: requests: storage: 2Gi storageClassName: nfs-clientPart 8: Per-Route Middlewares
Section titled “Part 8: Per-Route Middlewares”Rate limiting (per-route isolation)
Section titled “Rate limiting (per-route isolation)”Each service gets its own rate limit middleware to prevent cross-service token bucket interference. The problem this solves: when multiple services share a single rate-limit-api middleware, Traefik maintains one token bucket per source IP per middleware instance. All routes sharing that middleware share the same bucket. Authentik OAuth flows generate 35+ requests in bursts (redirects, consent, callback, static assets), which would exceed a shared 10 req/s bucket and return 429s.
All per-route rate limit middlewares live in a single file:
# middleware/rate-limits.yaml (pattern -- 22 middlewares total)apiVersion: traefik.io/v1alpha1kind: Middlewaremetadata: name: rl-authentik namespace: traefikspec: rateLimit: average: 100 period: 1s burst: 500 sourceCriterion: requestHeaderName: X-Real-Client-IpsourceCriterion.requestHeaderName: X-Real-Client-Ip uses the header set by the sentinel plugin for per-IP bucketing. Without this, Traefik would use the connection source IP, which behind Cloudflare Tunnel is always the cloudflared pod IP — meaning all users would share one bucket.
Rate limits for monitoring/query services (Grafana, Prometheus, Alertmanager, Jaeger, Logpush, Traefik Dashboard, Traefik Prometheus) are currently commented out in their IngressRoutes. These services generate heavy internal query traffic (Grafana fires dozens of parallel Loki queries when loading dashboards), and rate limiting them causes query timeouts.
In-flight request limiting
Section titled “In-flight request limiting”apiVersion: traefik.io/v1alpha1kind: Middlewaremetadata: name: inflight-req namespace: traefikspec: inFlightReq: amount: 100 sourceCriterion: requestHeaderName: X-Real-Client-IpLimits concurrent connections per source IP to 100. Unlike rate limiting (which controls request rate), this controls concurrency. A single IP can’t monopolize all backend connections. Shared across all routes — this is fine because the limit is per-IP, not per-route.
apiVersion: traefik.io/v1alpha1kind: Middlewaremetadata: name: retry namespace: traefikspec: retry: attempts: 3 initialInterval: 100ms3 attempts total (1 initial + 2 retries) with exponential backoff starting at 100ms. Only retries on connection errors, NOT on non-2xx status codes. Also shared across all routes.
Authentik forward auth
Section titled “Authentik forward auth”apiVersion: traefik.io/v1alpha1kind: Middlewaremetadata: name: authentik-forward-auth namespace: authentikspec: forwardAuth: address: http://authentik-server.authentik.svc.cluster.local/outpost.goauthentik.io/auth/traefik trustForwardHeader: true authResponseHeaders: - X-authentik-username - X-authentik-groups - X-authentik-entitlements - X-authentik-email - X-authentik-name - X-authentik-uid - X-authentik-jwt - X-authentik-meta-jwks - X-authentik-meta-outpost - X-authentik-meta-provider - X-authentik-meta-app - X-authentik-meta-versionApplied per-route to services that need SSO protection (Jaeger UI, etc.). Traefik forwards a sub-request to Authentik’s embedded outpost; if Authentik returns 200, the original request proceeds with the X-authentik-* headers injected. If 401/403, the user is redirected to the Authentik login flow.
Part 9: IngressRoutes
Section titled “Part 9: IngressRoutes”20+ IngressRoutes route traffic from hostnames to backend services. Each IngressRoute specifies its middleware chain. The middleware execution order is: global middlewares first (sentinel → crowdsec → security-headers), then per-route middlewares in the order listed.
Middleware assignments
Section titled “Middleware assignments”| Route | Host | Middlewares | Namespace |
|---|---|---|---|
| Grafana | grafana-k3s.example.com | monitoring | |
| Prometheus | prom-k3s.example.com | monitoring | |
| Alertmanager | alertmanager-k3s.example.com | monitoring | |
| Jaeger | jaeger-k3s.example.com | monitoring | |
| Logpush | logpush-k3s.example.com | monitoring | |
| Traefik Dashboard | traefik-dashboard.example.com | traefik | |
| Traefik Prometheus | traefik-prometheus.example.com | traefik | |
| Authentik | authentik.example.com | authentik | |
| Revista | mydomain.com | rl-revista, inflight-req, retry | revista |
| ArgoCD (HTTP) | argocd.example.com | rl-argocd, inflight-req, retry | argocd |
| ArgoCD (gRPC) | argocd.example.com + gRPC header | rl-argocd, inflight-req, retry | argocd |
| Dendrite | dendrite.example.com | rl-dendrite, inflight-req, retry | dendrite |
| httpbun | httpbun-k3s.example.com | rl-httpbun, inflight-req, retry | httpbun |
| Jitsi (from Element) | jitsi.example.com + Referer match | rl-jitsi, inflight-req, retry | jitsi |
| Jitsi (direct) | jitsi.example.com | rl-jitsi, inflight-req, retry | jitsi |
| LiveKit JWT | matrix-rtc.example.com/livekit/jwt | rl-livekit, inflight-req, strip-livekit-jwt, retry | livekit |
| LiveKit SFU | matrix-rtc.example.com/livekit/sfu | rl-livekit, inflight-req, strip-livekit-sfu, retry | livekit |
| Element (chat) | chat.example.com | rl-matrix-element, inflight-req, retry | matrix |
| Synapse Admin | admin.matrix.example.com | rl-matrix-admin, inflight-req, retry | matrix |
| Synapse | matrix.example.com | rl-matrix-synapse, inflight-req, retry | matrix |
| Maubot | maubot.example.com | rl-maubot, inflight-req, retry | maubot |
| Headlamp | headlamp-k3s.example.com | rl-headlamp, inflight-req, retry | headlamp |
| Longhorn | longhorn.example.com | rl-longhorn, inflight-req, retry | longhorn-system |
| Portainer | portainer-k3s.example.com | rl-portainer, inflight-req, retry | portainer |
| Portainer Agent | port-agent-k3s.example.com | rl-portainer-agent, inflight-req, retry | portainer |
| Security Dashboard | security-k3s.example.com | authentik-forward-auth | security-dashboard |
Strikethrough (~~) indicates rate limits that are currently commented out.
Part 10: Cloudflare Tunnel Integration
Section titled “Part 10: Cloudflare Tunnel Integration”All external traffic enters the cluster through a Cloudflare Tunnel. The tunnel connects from a cloudflared Deployment inside the cluster to Cloudflare’s edge network via outbound QUIC connections — no inbound ports or public IPs needed.
How it works
Section titled “How it works”cloudflaredruns in thecloudflarednamespace, maintains 4 HA connections to Cloudflare edge- DNS CNAME records point each hostname to the tunnel’s
.cfargotunnel.comaddress - Cloudflare edge receives the request, looks up the tunnel config, and forwards to
cloudflared cloudflaredroutes to the Traefik Service based on hostname matching in the tunnel ingress rules- Traefik handles TLS termination, middleware, and routing to the backend
Tunnel ingress rules (OpenTofu)
Section titled “Tunnel ingress rules (OpenTofu)”Each hostname maps to the Traefik Service’s cluster-internal HTTPS endpoint:
ingress_rule { hostname = "grafana-k3s.${var.secondary_domain_name}" service = "https://traefik.traefik.svc.cluster.local" origin_request { origin_server_name = "grafana-k3s.${var.secondary_domain_name}" http2_origin = true }}origin_server_name is set to the actual hostname so cloudflared presents the correct SNI to Traefik. http2_origin = true enables HTTP/2 between cloudflared and Traefik, which is needed for gRPC (ArgoCD) and improves multiplexing.
Each service that needs external access gets its own ingress rule. For example, the security dashboard:
ingress_rule { hostname = "security-k3s.${var.secondary_domain_name}" service = "http://security-dashboard.security-dashboard.svc.cluster.local" origin_request { origin_server_name = "security-k3s.${var.secondary_domain_name}" http2_origin = true }}The catch-all rule at the bottom returns 404 for unrecognized hostnames:
ingress_rule { service = "http_status:404"}DNS records (OpenTofu)
Section titled “DNS records (OpenTofu)”Each service gets a CNAME record pointing to the tunnel:
resource "cloudflare_record" "grafana-k3s" { zone_id = var.cloudflare_secondary_zone_id name = "grafana-k3s" type = "CNAME" content = cloudflare_zero_trust_tunnel_cloudflared.k3s.cname proxied = true tags = ["k3s", "monitoring"]}proxied = true routes traffic through Cloudflare’s edge (DDoS protection, WAF, caching). The CNAME target is the tunnel’s unique .cfargotunnel.com address, auto-generated by the cloudflare_zero_trust_tunnel_cloudflared resource.
Part 11: KEDA Autoscaling
Section titled “Part 11: KEDA Autoscaling”Traefik uses a KEDA ScaledObject with 5 triggers for intelligent autoscaling between 1 and 8 replicas:
apiVersion: keda.sh/v1alpha1kind: ScaledObjectmetadata: name: traefik-keda namespace: traefikspec: scaleTargetRef: name: traefik pollingInterval: 5 cooldownPeriod: 10 minReplicaCount: 1 maxReplicaCount: 8 triggers: - type: cpu metadata: type: Utilization value: "50" - type: memory metadata: type: Utilization value: "75" - type: prometheus metadata: serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring.svc.cluster.local:9090 metricName: traefik_open_connections threshold: "1000" query: sum(traefik_open_connections{entrypoint="websecure"}) - type: prometheus metadata: metricName: traefik_request_duration threshold: "0.5" query: histogram_quantile(0.95, sum(rate(traefik_entrypoint_request_duration_seconds_bucket{entrypoint="websecure"}[1m])) by (le)) - type: prometheus metadata: metricName: traefik_requests_total threshold: "1000" query: sum(rate(traefik_entrypoint_requests_total{entrypoint="websecure"}[1m]))The Prometheus triggers query Traefik’s own metrics: open connection count, p95 request duration, and request rate. Any single trigger exceeding its threshold causes a scale-up. The 5-second polling interval and 10-second cooldown make it responsive without flapping.
Part 12: Observability
Section titled “Part 12: Observability”Access logs → Loki (with structured metadata)
Section titled “Access logs → Loki (with structured metadata)”Traefik writes JSON-formatted access logs to stdout:
args: - "--accesslog=true" - "--accesslog.format=json" - "--accesslog.bufferingsize=100" - "--accesslog.fields.defaultmode=keep" - "--accesslog.fields.headers.defaultmode=keep"The bufferingsize=100 buffers up to 100 log lines before flushing, reducing I/O pressure. fields.defaultmode=keep and fields.headers.defaultmode=keep include all fields and request/response headers in the JSON output — this is what enables the sentinel bot score and other custom headers to appear in the access logs.
The Alloy DaemonSet picks up these logs from the Traefik container’s stdout (via /var/log/pods/), parses the JSON, and sends them to Loki with 11 structured metadata fields (status, downstream_status, router, service, client_ip, real_client_ip, bot_score, request_path, duration, tls_version, user_agent). Dashboards query these SM fields directly instead of using | json full-line parsing, which is 5-10x faster. See the monitoring stack guide for the full Alloy config and the label/metadata split.
Tracing → Jaeger (via Alloy)
Section titled “Tracing → Jaeger (via Alloy)”args: - "--tracing.otlp=true" - "--tracing.otlp.grpc=true" - "--tracing.otlp.grpc.endpoint=alloy.monitoring.svc.cluster.local:4317" - "--tracing.otlp.grpc.insecure=true" - "--tracing.serviceName=traefik" - "--tracing.sampleRate=1.0"Traefik sends OTLP traces to the Alloy DaemonSet on each node, which batches and forwards them to Jaeger. 100% sample rate is fine for a homelab — in production you’d want to sample down.
Metrics → Prometheus
Section titled “Metrics → Prometheus”args: - "--metrics.prometheus=true" - "--metrics.prometheus.entrypoint=metrics" - "--metrics.prometheus.addrouterslabels=true"addrouterslabels=true adds a router label to all metrics, enabling per-IngressRoute dashboards and alerting. The metrics endpoint is scraped by a ServiceMonitor in the monitoring stack.
Deployment
Section titled “Deployment”Full deploy sequence
Section titled “Full deploy sequence”# 1. Disable built-in Traefik (one-time)ansible-playbook -i inventory.yml \ ansible-playbooks/my-playbooks/disable-builtin-traefik.yml \ --become --ask-become-pass
# 2. Apply CRDskubectl apply -f crds/kubernetes-crd-definition-v1.yml --server-sidekubectl apply -f crds/kubernetes-crd-rbac.yml
# 3. Apply TLS optionskubectl apply -f middleware/tls-options.yaml
# 4. Deploy plugin ConfigMapskubectl apply -f middleware/decompress-configmap.yamlkubectl apply -f middleware/sentinel-configmap.yaml
# 5. Deploy middleware CRDskubectl apply -f middleware/sentinel-middleware.yamlkubectl apply -f middleware/security-headers.yamlkubectl apply -f middleware/crowdsec-bouncer.yamlkubectl apply -f middleware/decompress-middleware.yamlkubectl apply -f middleware/authentik-forward-auth.yamlkubectl apply -f middleware/inflight-req.yamlkubectl apply -f middleware/retry.yamlkubectl apply -f middleware/rate-limits.yaml
# 6. Deploy Traefik (includes SA, ClusterRole, Service, Deployment, IngressClass, PDB)kubectl apply -f services/traefik.yaml
# 7. Deploy IngressRouteskubectl apply -f ingressroutes/
# 8. Deploy KEDA autoscalingkubectl apply -f hpa/traefik-keda-autoscaling.yaml
# 9. Apply DNS + tunnel config (OpenTofu)cd cloudflare-tunnel-tf/ && tofu applyVerification
Section titled “Verification”# Traefik pods running on different nodeskubectl get pods -n traefik -o wide
# All middlewares loadedkubectl get middlewares.traefik.io -n traefik
# TLS option activekubectl get tlsoptions.traefik.io -A
# IngressRoutes across all namespaceskubectl get ingressroutes.traefik.io -A
# KEDA ScaledObject activekubectl get scaledobject -n traefik
# Test bot detection (should return 403)curl -s -o /dev/null -w "%{http_code}" -H "User-Agent: sqlmap/1.0" https://httpbun-k3s.example.com/
# Test honeypot path (should return 403)curl -s -o /dev/null -w "%{http_code}" https://httpbun-k3s.example.com/.envPart 13: CrowdSec Operations
Section titled “Part 13: CrowdSec Operations”CrowdSec runs as a DaemonSet in the cluster and feeds into the bouncer middleware documented in Part 5. It can unexpectedly ban legitimate IPs.
Security Dashboard (web UI)
Section titled “Security Dashboard (web UI)”A dedicated Go+htmx web application at security-k3s.example.com provides a browser-based alternative to cscli for most CrowdSec operations. Protected by Authentik forward-auth.
| Feature | Dashboard | cscli |
|---|---|---|
| View decisions (paginated, sortable) | Yes | Yes |
| View alerts with expandable detail | Yes | Yes |
| Remove individual decisions | Yes | Yes |
| Create decisions (CF-style rule builder) | Yes | Yes (cscli decisions add) |
| Export/import decisions as JSON | Yes | No |
| GeoIP + ASN enrichment (country flags, org) | Yes | No |
| IP lookup (decisions + alerts + allowlist check) | Yes | Partial |
| View sentinel/rate-limit config | Yes | No |
| Manage allowlists (read-only + CLI hints) | Yes (read) | Yes (read+write) |
| Scenario breakdown with bar charts | Yes | Yes (cscli metrics) |
Source: services/security-dashboard/. Deploy: docker build → docker push erfianugrah/security-dashboard:latest → kubectl rollout restart. See security-stack.md for full details.
Checking if an IP is banned
Section titled “Checking if an IP is banned”# Via the Security Dashboard:# Navigate to the IP Lookup page and search
# Or exec into the CrowdSec LAPI pod:kubectl exec -n crowdsec deploy/crowdsec-lapi -- cscli decisions list
# Check a specific IP:kubectl exec -n crowdsec deploy/crowdsec-lapi -- cscli decisions list -i <your_ip>Removing a ban
Section titled “Removing a ban”If your own IP gets banned:
# Via the Security Dashboard:# Click "Remove" on the decision in the Decisions or Policy page
# Or via cscli:kubectl exec -n crowdsec deploy/crowdsec-lapi -- cscli decisions delete -i <your_ip>Dashboard false positives (parser-level whitelist)
Section titled “Dashboard false positives (parser-level whitelist)”Dashboard UIs like Headlamp and Grafana generate 404s by requesting plugin locale files (e.g. /static-plugins/prometheus/locales/en/translation.json). The crowdsecurity/http-probing scenario has a capacity of 10 with 10-second leak speed, so 11+ unique 404 paths in a burst will trigger a ban. The .json extension is not in CrowdSec’s static_ressource list, so these are treated as probing.
The fix is a parser-level whitelist that runs in the s02-enrich stage. Create dashboard-whitelist.yaml:
# dashboard-whitelist.yaml — placed in /etc/crowdsec/parsers/s02-enrich/name: custom/dashboard-whitelistdescription: "Whitelist known dashboard UI false positives"whitelist: reason: "known dashboard UI request patterns" expression: - >- evt.Meta.service == 'http' && evt.Meta.http_path startsWith '/static-plugins/' - >- evt.Meta.service == 'http' && evt.Meta.http_path startsWith '/plugins/' - >- evt.Meta.service == 'http' && evt.Meta.http_path matches '^/api/v[0-9]+/namespaces' && evt.Meta.http_status == '404'To deploy it in a DaemonSet-based CrowdSec agent, embed the file in the agent ConfigMap and copy it with an init container:
# In the agent ConfigMap, add the file as a data key:data: dashboard-whitelist.yaml: | name: custom/dashboard-whitelist ...
# In the agent DaemonSet, add an init container:initContainers: - name: init-config image: busybox:1.36 command: ["sh", "-c"] args: - | cp /config-custom/dashboard-whitelist.yaml \ /etc/crowdsec/parsers/s02-enrich/dashboard-whitelist.yaml volumeMounts: - name: crowdsec-config mountPath: /etc/crowdsec - name: agent-config mountPath: /config-customAfter deploying, verify the parser loaded and test with cscli explain:
# Verify parser is listed:kubectl exec -n crowdsec <agent-pod> -- cscli parsers list | grep dashboard
# Test with a simulated log line:kubectl exec -n crowdsec <agent-pod> -- cscli explain \ --type traefik \ --log '1.2.3.4 - - [18/Feb/2026:15:30:00 +0000] "GET /static-plugins/prometheus/locales/en/translation.json HTTP/1.1" 404 19 "-" "Mozilla/5.0" 1 "app@docker" "https://10.0.0.1:443" 5ms'# Expected output should show:# custom/dashboard-whitelist (~2 [whitelisted])# parser success, ignored by whitelist (known dashboard UI request patterns)
# Check whitelist metrics over time:kubectl exec -n crowdsec <agent-pod> -- cscli metrics show whitelistsAllowlists (CrowdSec v1.7+)
Section titled “Allowlists (CrowdSec v1.7+)”CrowdSec v1.7 introduced allowlists — persistent IP/CIDR whitelists that prevent matching IPs from being banned. Unlike parser-level whitelists (which prevent log events from being processed), allowlists operate at the decision level and can be managed dynamically.
# Create an allowlist:kubectl exec -n crowdsec deploy/crowdsec-lapi -- \ cscli allowlists create trusted-ips -d "Trusted IP addresses"
# Add entries:kubectl exec -n crowdsec deploy/crowdsec-lapi -- \ cscli allowlists add trusted-ips 10.0.0.0/8 -d "Internal network"
# Remove entries:kubectl exec -n crowdsec deploy/crowdsec-lapi -- \ cscli allowlists remove trusted-ips 10.0.0.0/8
# View allowlists:kubectl exec -n crowdsec deploy/crowdsec-lapi -- cscli allowlists listThe Security Dashboard shows all allowlists in a read-only view with items, country flags, and expiry dates. Write operations require cscli (the LAPI allowlist endpoints are read-only via machine auth).
Debugging 403 errors
Section titled “Debugging 403 errors”If services behind Traefik return 403 Forbidden unexpectedly, check in this order:
- CrowdSec decisions — is the client IP banned?
- Sentinel plugin — is it flagging the request as a bot? Check the
X-Sentinel-Bot-Scoreheader - Cloudflare WAF — check the Cloudflare dashboard for firewall events
The global middleware chain on the websecure entrypoint is: sentinel -> crowdsec-bouncer -> security-headers. A 403 from CrowdSec will be intercepted by the Sentinel plugin’s response interceptor and rendered as a styled HTML error page showing the block source, reason, trace ID, and client IP.
File Reference
Section titled “File Reference”services/ traefik.yaml # SA, ClusterRole, Service, Deployment, IngressClass, PDB
crds/ kubernetes-crd-definition-v1.yml # Traefik CRDs (~3.5 MB) kubernetes-crd-rbac.yml # ClusterRole for CRD provider
middleware/ # Local plugins (source + ConfigMap + Middleware CRD) sentinel-plugin/ sentinel.go # 435-line Go source go.mod / .traefik.yml sentinel-configmap.yaml # ConfigMap packaging for k8s sentinel-middleware.yaml # Middleware CRD with config
decompress-plugin/ decompress.go # 71-line Go source go.mod / .traefik.yml decompress-configmap.yaml # ConfigMap packaging for k8s decompress-middleware.yaml # Middleware CRD (in monitoring ns)
# Remote plugin crowdsec-bouncer.yaml # SOPS-encrypted Middleware CRD
# Global middlewares security-headers.yaml # HSTS, nosniff, permissions policy tls-options.yaml # TLSOption (min TLS 1.2, AEAD ciphers, sniStrict)
# Shared per-route middlewares rate-limits.yaml # 22 per-route rate limit middlewares (rl-*) inflight-req.yaml # 100 concurrent req/IP retry.yaml # 3 attempts, 100ms backoff
# Auth authentik-forward-auth.yaml # Forward auth to Authentik (in authentik ns)
ingressroutes/ alertmanager-ingress.yaml # monitoring alloy-logpush-ingress.yaml # monitoring (+ decompress middleware) argocd-ingress.yaml # argocd (2 routes: HTTP + gRPC) authentik-ingress.yaml # authentik dendrite-ingress.yaml # dendrite grafana-ingress.yaml # monitoring httpbun-ingress.yaml # httpbun jaeger-ingress.yaml # monitoring (+ authentik-forward-auth) longhorn-ingress.yaml # longhorn-system portainer-agent-ingress.yaml # portainer portainer-ingress.yaml # portainer prometheus-ingress.yaml # monitoring revista-ingress.yaml # revista traefik-dashboard-ingress.yaml # traefik (api@internal) traefik-prometheus-ingress.yaml # traefik (prometheus@internal)
services/*/ingress.yaml # Service-embedded IngressRoutes security-dashboard/manifests.yaml # SA, RBAC, Secret, Deployment, Service, IngressRoute headlamp/ingressroute.yaml jitsi/ingress.yaml # 2 routes: Referer-gated + direct livekit/ingress.yaml # 2 routes + stripPrefix middlewares matrix/ingress.yaml # 3 routes: Element, Synapse Admin, Synapse maubot/ingress.yaml
services/crowdsec/ agent-configmap.yaml # Agent config + custom parsers agent-daemonset.yaml # Agent DaemonSet (init container copies parsers) lapi-deployment.yaml # LAPI server deployment lapi-configmap.yaml # LAPI configuration scenarios-configmap.yaml # Custom detection scenarios
services/security-dashboard/ main.go # Go+htmx dashboard (~2300 lines, zero deps) manifests.yaml # SOPS-encrypted (ns, SA, RBAC, secret, deploy, svc, ingress) Dockerfile # Multi-stage ARM64 build ui/ # Templates + static assets (go:embed)
hpa/ traefik-keda-autoscaling.yaml # 5-trigger ScaledObject (1-8 replicas)
pvc-claims/ traefik-ssl-pvc.yaml # 2Gi NFS PVC for ACME cert storage
cloudflare-tunnel-tf/ tunnel_config.tf # Tunnel ingress rules (hostname → Traefik) records.tf # DNS CNAME records → tunnel
ansible-playbooks/my-playbooks/ disable-builtin-traefik.yml # Disables k3s built-in Traefik + ServiceLB