Skip to content

Traefik on k3s: Custom Deployment, Plugins, Middlewares, and Cloudflare Tunnel

A complete guide to replacing k3s’s built-in Traefik with a fully custom deployment on a 4-node ARM64 homelab cluster. The built-in Traefik is fine for simple setups, but it doesn’t support local plugins, has limited middleware configuration, and doesn’t expose the level of control needed for things like bot detection, request body decompression, or per-route rate limiting.

This guide covers the full setup: disabling the built-in Traefik, deploying a custom one as a raw Deployment manifest, writing and packaging Traefik Go plugins as ConfigMaps, configuring the middleware chain, managing TLS certificates via Cloudflare DNS challenge, routing traffic through Cloudflare Tunnel, autoscaling with KEDA, and piping access logs + traces into the monitoring stack.


All HTTP traffic enters through Cloudflare’s edge network, passes through a Cloudflare Tunnel (cloudflared running in the cluster), and hits the custom Traefik deployment in the traefik namespace. Traefik terminates TLS (ACME certs via Cloudflare DNS challenge), runs the global middleware chain (sentinel → crowdsec-bouncer → security-headers), then routes to per-route middlewares and backend services.

d2 diagram
ComponentVersionImage
Traefikv3.6.8traefik:v3.6.8
CrowdSec Bouncer pluginv1.5.0(remote, fetched by Traefik)
cloudflared2026.2.0cloudflare/cloudflared:2026.2.0
KEDA(cluster-wide)(already deployed)

k3s ships with Traefik as a bundled Helm chart. It auto-deploys on the server node and manages its own CRDs. To run a custom Traefik, the built-in one must be fully disabled — otherwise you get two Traefik instances fighting over the same IngressRoutes.

ansible-playbooks/my-playbooks/disable-builtin-traefik.yml
---
- name: Disable k3s built-in Traefik and ServiceLB on server
hosts: server
become: yes
tasks:
- name: Add disable directives to k3s config.yaml
ansible.builtin.blockinfile:
path: /etc/rancher/k3s/config.yaml
marker: "# {mark} ANSIBLE MANAGED - disable built-in addons"
block: |
disable:
- traefik
- servicelb
create: no
register: config_changed
- name: Remove k3s bundled traefik manifest files
ansible.builtin.file:
path: "{{ item }}"
state: absent
loop:
- /var/lib/rancher/k3s/server/manifests/traefik.yaml
- /var/lib/rancher/k3s/server/static/charts/traefik-crd-38.0.201+up38.0.2.tgz
- /var/lib/rancher/k3s/server/static/charts/traefik-38.0.201+up38.0.2.tgz
register: manifests_removed
- name: Restart k3s to pick up config change
ansible.builtin.systemd:
name: k3s
state: restarted
daemon_reload: yes
when: config_changed.changed
- name: Wait for k3s API to be ready after restart
ansible.builtin.wait_for:
port: 6443
host: "{{ ansible_host }}"
delay: 10
timeout: 120
when: config_changed.changed

Run it:

Terminal window
ansible-playbook -i inventory.yml \
ansible-playbooks/my-playbooks/disable-builtin-traefik.yml \
--become --ask-become-pass

Safe to re-run (idempotent). The playbook also removes stale chart tarballs from k3s’s static manifests directory — without this, k3s may re-deploy the built-in Traefik on restart even with disable set.


Traefik’s Kubernetes CRD provider needs its own CRD definitions (IngressRoute, Middleware, TLSOption, etc.) and RBAC permissions. These are separate from the Traefik Deployment itself and must be applied first.

Terminal window
# Apply Traefik CRDs (one-time, or on Traefik version upgrades)
kubectl apply -f crds/kubernetes-crd-definition-v1.yml --server-side
kubectl apply -f crds/kubernetes-crd-rbac.yml

The CRD file is large (~3.5 MB) and requires --server-side due to the annotation size limit. RBAC grants the Traefik ServiceAccount read access to IngressRoutes, Middlewares, TLSOptions, Services, Secrets, EndpointSlices, and related resources across both traefik.io and the legacy traefik.containo.us API groups.


The entire Traefik deployment lives in a single manifest: services/traefik.yaml. It contains a ServiceAccount, ClusterRole, ClusterRoleBinding, LoadBalancer Service, Deployment, IngressClass, and PodDisruptionBudget.

Five entrypoints handle different traffic types:

EntrypointAddressProtocolPurpose
web:8000/tcpHTTPRedirect to HTTPS (unused behind tunnel)
websecure:8443HTTPS + HTTP/3 + QUICAll production traffic
metrics:8082/tcpHTTPPrometheus metrics scrape endpoint
traefik:9000/tcpHTTPDashboard API + health checks (/ping)
jvb-udp:10000/udpUDPJitsi Videobridge media

The websecure entrypoint is the workhorse. Key settings:

args:
- "--entrypoints.websecure.address=:8443"
- "--entrypoints.websecure.http.tls=true"
- "--entrypoints.websecure.http.tls.certResolver=cloudflare"
- "--entrypoints.websecure.http3=true"
- "--entrypoints.websecure.http3.advertisedport=443"
- "--entrypoints.websecure.http2.maxConcurrentStreams=512"
# Global middlewares applied to ALL websecure requests
- "--entrypoints.websecure.http.middlewares=traefik-sentinel@kubernetescrd,traefik-crowdsec-bouncer@kubernetescrd,traefik-security-headers@kubernetescrd"

HTTP/3 is enabled with advertisedport=443 because the container listens on 8443 but the LoadBalancer Service maps port 443 → 8443. Without the advertised port, clients would try QUIC on port 8443 and fail.

args:
- "--entrypoints.websecure.transport.respondingTimeouts.readTimeout=60s"
- "--entrypoints.websecure.transport.respondingTimeouts.writeTimeout=0s"
- "--entrypoints.websecure.transport.respondingTimeouts.idleTimeout=180s"
- "--entrypoints.websecure.transport.lifeCycle.graceTimeOut=30s"
- "--entrypoints.websecure.transport.lifeCycle.requestAcceptGraceTimeout=5s"

writeTimeout=0s (disabled) is intentional. Matrix (Synapse), Jitsi, and LiveKit all use long-lived WebSocket connections. A non-zero write timeout would kill WebSocket connections that don’t send data within the timeout window. The tradeoff is that slowloris-style attacks against WebSocket endpoints aren’t mitigated at the Traefik layer — but CrowdSec and Cloudflare’s DDoS protection handle that upstream.

args:
- "--entrypoints.websecure.forwardedHeaders.trustedIPs=173.245.48.0/20,103.21.244.0/22,..."

All Cloudflare IPv4 and IPv6 ranges are listed as trusted IPs. This tells Traefik to trust X-Forwarded-For headers from these IPs, which is necessary because Cloudflare Tunnel connects from Cloudflare edge IPs. Without this, X-Forwarded-For would be stripped and the sentinel plugin would see the cloudflared pod IP instead of the real client IP.

env:
- name: GOMAXPROCS
value: "2"
- name: GOMEMLIMIT
value: "900MiB"

On ARM64 homelab nodes with 4 cores, limiting GOMAXPROCS to 2 prevents Traefik from consuming all CPU cores. GOMEMLIMIT at 900MiB (with a 1024Mi limit) gives the Go GC a soft target to aim for, reducing OOM kills from GC pressure spikes.

securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true

The root filesystem is read-only. Writable paths are provided via volume mounts: /ssl-certs-2 (PVC for ACME certs), /tmp (emptyDir), /plugins-local/ (ConfigMap mounts for plugins), /plugins-storage (emptyDir for remote plugin cache).

affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app.kubernetes.io/name: traefik
topologyKey: kubernetes.io/hostname

With 2 replicas, the anti-affinity preference spreads them across different nodes. It’s preferred not required because on a 4-node cluster with other workloads, there might not always be two nodes available.

The PDB ensures at least 1 replica is always available during voluntary disruptions (node drains, rolling updates):

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: traefik-pdb
namespace: traefik
spec:
minAvailable: 1
lifecycle:
preStop:
exec:
command: ["sh", "-c", "sleep 10"]

The 10-second pre-stop sleep gives the Service endpoints time to de-register from kube-proxy before the pod starts shutting down. Without this, in-flight requests can hit a pod that’s already draining.


Traefik supports two types of plugins: remote (fetched from GitHub on startup) and local (mounted from the filesystem). Local plugins use Traefik’s Yaegi Go interpreter — you write standard Go code, and Traefik interprets it at runtime. No compilation step needed.

  1. Plugin source goes into /plugins-local/src/<moduleName>/ inside the Traefik container
  2. The module must have go.mod, .traefik.yml, and the Go source file
  3. Traefik is told about the plugin via --experimental.localPlugins.<name>.moduleName=<moduleName>
  4. A Middleware CRD references the plugin by name under spec.plugin.<name>

Since Traefik runs with readOnlyRootFilesystem: true, the plugin files are packaged as ConfigMaps and mounted as volumes.

Plugin 1: Sentinel (bot detection + IP resolution)

Section titled “Plugin 1: Sentinel (bot detection + IP resolution)”

Sentinel is a custom plugin that replaces the standalone realclientip plugin. It does two things: resolves the real client IP from trusted headers (Cloudflare’s Cf-Connecting-Ip or X-Forwarded-For with proxy skipping), and runs heuristic bot detection that scores each request.

Request flow:

  1. Resolve real client IP from trusted headers (Cf-Connecting-Ip > XFF right-to-left > RemoteAddr)
  2. Set X-Real-Client-Ip header (used by rate limiters and Loki analytics)
  3. Score request using 9 heuristic signals
  4. Set X-Bot-Score header (always, regardless of score — this feeds the Grafana security dashboard via Loki)
  5. If score >= blockThreshold (100): return 403 Forbidden
  6. Otherwise: pass to next middleware

Scoring signals:

SignalScoreRationale
Scanner UA substring match+100sqlmap, nikto, nuclei, zgrab, etc. — one match is enough to block
Honeypot path match+100/.env, /.git/HEAD, /wp-login.php, etc. — no legitimate client requests these
Empty User-Agent+40Most real browsers always send UA
Missing Accept header+30Browsers always send Accept
HTTP/1.0 protocol+25Almost no modern client uses HTTP/1.0
Missing Accept-Language+20Browsers send this; most bots don’t
Missing Accept-Encoding+15Browsers send this
Connection: close with HTTP/1.1+10Unusual for real clients
Per-IP rate exceeded (>30 req/s)+30Sliding window rate tracker per IP

A request with a known scanner UA (+100) gets blocked immediately. A request with no UA (+40), no Accept (+30), and no Accept-Language (+20) also gets blocked (90 total, but add missing Accept-Encoding at +15 = 105 >= 100). The per-IP rate tracker uses a sliding window with background cleanup to prevent memory leaks.

Packaging as ConfigMap:

The plugin source, go.mod, and .traefik.yml are inlined in a ConfigMap:

middleware/sentinel-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: traefik-plugin-sentinel
namespace: traefik
data:
sentinel.go: |
package sentinel
// ... (full Go source, ~435 lines)
go.mod: |
module github.com/erfianugrah/sentinel
go 1.22
.traefik.yml: |
displayName: Sentinel
type: middleware
import: github.com/erfianugrah/sentinel
summary: Real client IP resolution + heuristic bot detection with scoring.
testData:
trustedHeaders:
- Cf-Connecting-Ip
- X-Forwarded-For
# ...

Mounted in the Deployment:

volumeMounts:
- name: plugin-sentinel
mountPath: /plugins-local/src/github.com/erfianugrah/sentinel
readOnly: true
volumes:
- name: plugin-sentinel
configMap:
name: traefik-plugin-sentinel

Enabled via args:

args:
- "--experimental.localPlugins.sentinel.moduleName=github.com/erfianugrah/sentinel"

Middleware CRD (applied as a global middleware on the websecure entrypoint):

middleware/sentinel-middleware.yaml
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: sentinel
namespace: traefik
spec:
plugin:
sentinel:
trustedHeaders:
- Cf-Connecting-Ip
- X-Forwarded-For
trustedProxies:
- "10.42.0.0/16" # k3s pod CIDR
- "10.43.0.0/16" # k3s service CIDR
- "173.245.48.0/20" # Cloudflare IPv4
# ... all CF ranges
enabled: true
blockThreshold: 100
tagThreshold: 60
scannerUAs: "sqlmap,nikto,dirbuster,masscan,zgrab,nuclei,httpx,gobuster,ffuf,nmap,whatweb,wpscan,joomla,drupal"
honeypotPaths: "/.env,/.git/HEAD,/.git/config,/wp-login.php,/wp-config.php,/wp-admin,/.aws/credentials,/actuator/env,/actuator/health,/xmlrpc.php,/.DS_Store,/config.json,/package.json,/.htaccess,/server-status,/debug/pprof"
rateLimitPerSecond: 30
rateLimitWindowSeconds: 10

The decompress plugin exists for one reason: Cloudflare Logpush always gzip-compresses HTTP payloads, and Alloy’s /loki/api/v1/raw endpoint doesn’t handle Content-Encoding: gzip. Traefik’s built-in compress middleware only handles response compression, not request body decompression.

The plugin is simple — 71 lines of Go:

func (d *Decompress) ServeHTTP(rw http.ResponseWriter, req *http.Request) {
encoding := strings.ToLower(req.Header.Get("Content-Encoding"))
if encoding != "gzip" {
d.next.ServeHTTP(rw, req)
return
}
gzReader, err := gzip.NewReader(req.Body)
if err != nil {
http.Error(rw, fmt.Sprintf("failed to create gzip reader: %v", err), http.StatusBadRequest)
return
}
defer gzReader.Close()
decompressed, err := io.ReadAll(gzReader)
if err != nil {
http.Error(rw, fmt.Sprintf("failed to decompress body: %v", err), http.StatusBadRequest)
return
}
req.Body = io.NopCloser(bytes.NewReader(decompressed))
req.ContentLength = int64(len(decompressed))
req.Header.Set("Content-Length", strconv.Itoa(len(decompressed)))
req.Header.Del("Content-Encoding")
d.next.ServeHTTP(rw, req)
}

Same ConfigMap packaging pattern as sentinel. The decompress middleware CRD lives in the monitoring namespace (same as the Alloy Logpush IngressRoute that uses it):

middleware/decompress-middleware.yaml
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: decompress
namespace: monitoring
spec:
plugin:
decompress: {}

Published at github.com/erfianugrah/decompress.


The CrowdSec Bouncer Traefik plugin checks each request’s IP against CrowdSec’s community threat intelligence via the LAPI (Local API). Blocked IPs get a 403.

Remote plugins are fetched by Traefik on startup from GitHub. No ConfigMap needed — just the plugin declaration in args:

args:
- "--experimental.plugins.bouncer.modulename=github.com/maxlerebourg/crowdsec-bouncer-traefik-plugin"
- "--experimental.plugins.bouncer.version=v1.5.0"

The Middleware CRD contains the LAPI connection details and must be SOPS-encrypted because it includes the bouncer API key:

# middleware/crowdsec-bouncer.yaml (structure -- values are SOPS-encrypted)
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: crowdsec-bouncer
namespace: traefik
spec:
plugin:
bouncer:
enabled: true
crowdsecMode: stream
updateIntervalSeconds: 60
defaultDecisionSeconds: 300
crowdsecLapiScheme: https
crowdsecLapiHost: <your-crowdsec-lapi-endpoint>
crowdsecLapiKey: <your-bouncer-api-key>
forwardedHeadersTrustedIPs:
- "10.42.0.0/16"
- "10.43.0.0/16"
clientTrustedIPs: []
forwardedHeadersCustomName: X-Real-Client-Ip

forwardedHeadersCustomName: X-Real-Client-Ip tells the bouncer to read the real client IP from the header set by sentinel, not from X-Forwarded-For (which might have multiple IPs).


Three middlewares are applied globally to every request on the websecure entrypoint via the --entrypoints.websecure.http.middlewares flag:

- "--entrypoints.websecure.http.middlewares=traefik-sentinel@kubernetescrd,traefik-crowdsec-bouncer@kubernetescrd,traefik-security-headers@kubernetescrd"

The format is <namespace>-<name>@kubernetescrd. Order matters — sentinel runs first (sets IP + bot score), then crowdsec-bouncer (checks IP reputation), then security-headers.

middleware/security-headers.yaml
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: security-headers
namespace: traefik
spec:
headers:
stsSeconds: 63072000 # HSTS 2 years
stsIncludeSubdomains: true
stsPreload: true
contentTypeNosniff: true
referrerPolicy: "strict-origin-when-cross-origin"
permissionsPolicy: "camera=(), microphone=(), geolocation=(), payment=()"
customResponseHeaders:
Server: "" # Strip server identity
X-Powered-By: ""

frameDeny, browserXssFilter, and CSP are intentionally omitted from the global middleware. These are app-specific — Authentik needs its own CSP, Grafana needs iframe support for embedding, etc. Apply those per-route where needed.


middleware/tls-options.yaml
apiVersion: traefik.io/v1alpha1
kind: TLSOption
metadata:
name: default
namespace: default
spec:
minVersion: VersionTLS12
maxVersion: VersionTLS13
cipherSuites:
# TLS 1.2 only -- TLS 1.3 ciphers are not configurable in Go (all safe by default)
- TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
- TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
- TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
- TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
- TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256
- TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
curvePreferences:
- X25519
- CurveP256
sniStrict: true
alpnProtocols:
- h2
- http/1.1

The TLSOption must be named default in the default namespace for Traefik to pick it up as the default TLS configuration. All cipher suites are AEAD-only (GCM or ChaCha20-Poly1305) — no CBC mode. sniStrict: true rejects connections that don’t present a valid SNI hostname matching a known route.

args:
- "--certificatesresolvers.cloudflare.acme.dnschallenge.provider=cloudflare"
- "--certificatesresolvers.cloudflare.acme.email=erfi.anugrah@gmail.com"
- "--certificatesresolvers.cloudflare.acme.dnschallenge.resolvers=1.1.1.1"
- "--certificatesresolvers.cloudflare.acme.storage=/ssl-certs-2/acme-cloudflare.json"

The CF_DNS_API_TOKEN env var is pulled from a Kubernetes Secret (cloudflare-credentials). The ACME cert storage lives on an NFS PVC (traefik-ssl-2, 2Gi, RWX) so certs survive pod restarts and don’t trigger Let’s Encrypt rate limits on every rollout.

pvc-claims/traefik-ssl-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: traefik-ssl-2
namespace: traefik
spec:
accessModes: [ReadWriteMany]
resources:
requests:
storage: 2Gi
storageClassName: nfs-client

Each service gets its own rate limit middleware to prevent cross-service token bucket interference. The problem this solves: when multiple services share a single rate-limit-api middleware, Traefik maintains one token bucket per source IP per middleware instance. All routes sharing that middleware share the same bucket. Authentik OAuth flows generate 35+ requests in bursts (redirects, consent, callback, static assets), which would exceed a shared 10 req/s bucket and return 429s.

All per-route rate limit middlewares live in a single file:

# middleware/rate-limits.yaml (pattern -- 22 middlewares total)
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: rl-authentik
namespace: traefik
spec:
rateLimit:
average: 100
period: 1s
burst: 500
sourceCriterion:
requestHeaderName: X-Real-Client-Ip

sourceCriterion.requestHeaderName: X-Real-Client-Ip uses the header set by the sentinel plugin for per-IP bucketing. Without this, Traefik would use the connection source IP, which behind Cloudflare Tunnel is always the cloudflared pod IP — meaning all users would share one bucket.

Rate limits for monitoring/query services (Grafana, Prometheus, Alertmanager, Jaeger, Logpush, Traefik Dashboard, Traefik Prometheus) are currently commented out in their IngressRoutes. These services generate heavy internal query traffic (Grafana fires dozens of parallel Loki queries when loading dashboards), and rate limiting them causes query timeouts.

middleware/inflight-req.yaml
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: inflight-req
namespace: traefik
spec:
inFlightReq:
amount: 100
sourceCriterion:
requestHeaderName: X-Real-Client-Ip

Limits concurrent connections per source IP to 100. Unlike rate limiting (which controls request rate), this controls concurrency. A single IP can’t monopolize all backend connections. Shared across all routes — this is fine because the limit is per-IP, not per-route.

middleware/retry.yaml
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: retry
namespace: traefik
spec:
retry:
attempts: 3
initialInterval: 100ms

3 attempts total (1 initial + 2 retries) with exponential backoff starting at 100ms. Only retries on connection errors, NOT on non-2xx status codes. Also shared across all routes.

middleware/authentik-forward-auth.yaml
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: authentik-forward-auth
namespace: authentik
spec:
forwardAuth:
address: http://authentik-server.authentik.svc.cluster.local/outpost.goauthentik.io/auth/traefik
trustForwardHeader: true
authResponseHeaders:
- X-authentik-username
- X-authentik-groups
- X-authentik-entitlements
- X-authentik-email
- X-authentik-name
- X-authentik-uid
- X-authentik-jwt
- X-authentik-meta-jwks
- X-authentik-meta-outpost
- X-authentik-meta-provider
- X-authentik-meta-app
- X-authentik-meta-version

Applied per-route to services that need SSO protection (Jaeger UI, etc.). Traefik forwards a sub-request to Authentik’s embedded outpost; if Authentik returns 200, the original request proceeds with the X-authentik-* headers injected. If 401/403, the user is redirected to the Authentik login flow.


20+ IngressRoutes route traffic from hostnames to backend services. Each IngressRoute specifies its middleware chain. The middleware execution order is: global middlewares first (sentinel → crowdsec → security-headers), then per-route middlewares in the order listed.

RouteHostMiddlewaresNamespace
Grafanagrafana-k3s.example.comrl-grafana, inflight-req, retrymonitoring
Prometheusprom-k3s.example.comrl-prometheus, inflight-req, retrymonitoring
Alertmanageralertmanager-k3s.example.comrl-alertmanager, inflight-req, retrymonitoring
Jaegerjaeger-k3s.example.comrl-jaeger, inflight-req, authentik-forward-auth, retrymonitoring
Logpushlogpush-k3s.example.comrl-logpush, inflight-req, decompress, retrymonitoring
Traefik Dashboardtraefik-dashboard.example.comrl-traefik-dashboard, inflight-req, retrytraefik
Traefik Prometheustraefik-prometheus.example.comrl-traefik-prometheus, inflight-req, retrytraefik
Authentikauthentik.example.comrl-authentik, inflight-req, authentik-csp, retryauthentik
Revistamydomain.comrl-revista, inflight-req, retryrevista
ArgoCD (HTTP)argocd.example.comrl-argocd, inflight-req, retryargocd
ArgoCD (gRPC)argocd.example.com + gRPC headerrl-argocd, inflight-req, retryargocd
Dendritedendrite.example.comrl-dendrite, inflight-req, retrydendrite
httpbunhttpbun-k3s.example.comrl-httpbun, inflight-req, retryhttpbun
Jitsi (from Element)jitsi.example.com + Referer matchrl-jitsi, inflight-req, retryjitsi
Jitsi (direct)jitsi.example.comrl-jitsi, inflight-req, retryjitsi
LiveKit JWTmatrix-rtc.example.com/livekit/jwtrl-livekit, inflight-req, strip-livekit-jwt, retrylivekit
LiveKit SFUmatrix-rtc.example.com/livekit/sfurl-livekit, inflight-req, strip-livekit-sfu, retrylivekit
Element (chat)chat.example.comrl-matrix-element, inflight-req, retrymatrix
Synapse Adminadmin.matrix.example.comrl-matrix-admin, inflight-req, retrymatrix
Synapsematrix.example.comrl-matrix-synapse, inflight-req, retrymatrix
Maubotmaubot.example.comrl-maubot, inflight-req, retrymaubot
Headlampheadlamp-k3s.example.comrl-headlamp, inflight-req, retryheadlamp
Longhornlonghorn.example.comrl-longhorn, inflight-req, retrylonghorn-system
Portainerportainer-k3s.example.comrl-portainer, inflight-req, retryportainer
Portainer Agentport-agent-k3s.example.comrl-portainer-agent, inflight-req, retryportainer
Security Dashboardsecurity-k3s.example.comauthentik-forward-authsecurity-dashboard

Strikethrough (~~) indicates rate limits that are currently commented out.


All external traffic enters the cluster through a Cloudflare Tunnel. The tunnel connects from a cloudflared Deployment inside the cluster to Cloudflare’s edge network via outbound QUIC connections — no inbound ports or public IPs needed.

  1. cloudflared runs in the cloudflared namespace, maintains 4 HA connections to Cloudflare edge
  2. DNS CNAME records point each hostname to the tunnel’s .cfargotunnel.com address
  3. Cloudflare edge receives the request, looks up the tunnel config, and forwards to cloudflared
  4. cloudflared routes to the Traefik Service based on hostname matching in the tunnel ingress rules
  5. Traefik handles TLS termination, middleware, and routing to the backend

Each hostname maps to the Traefik Service’s cluster-internal HTTPS endpoint:

cloudflare-tunnel-tf/tunnel_config.tf
ingress_rule {
hostname = "grafana-k3s.${var.secondary_domain_name}"
service = "https://traefik.traefik.svc.cluster.local"
origin_request {
origin_server_name = "grafana-k3s.${var.secondary_domain_name}"
http2_origin = true
}
}

origin_server_name is set to the actual hostname so cloudflared presents the correct SNI to Traefik. http2_origin = true enables HTTP/2 between cloudflared and Traefik, which is needed for gRPC (ArgoCD) and improves multiplexing.

Each service that needs external access gets its own ingress rule. For example, the security dashboard:

ingress_rule {
hostname = "security-k3s.${var.secondary_domain_name}"
service = "http://security-dashboard.security-dashboard.svc.cluster.local"
origin_request {
origin_server_name = "security-k3s.${var.secondary_domain_name}"
http2_origin = true
}
}

The catch-all rule at the bottom returns 404 for unrecognized hostnames:

ingress_rule {
service = "http_status:404"
}

Each service gets a CNAME record pointing to the tunnel:

cloudflare-tunnel-tf/records.tf
resource "cloudflare_record" "grafana-k3s" {
zone_id = var.cloudflare_secondary_zone_id
name = "grafana-k3s"
type = "CNAME"
content = cloudflare_zero_trust_tunnel_cloudflared.k3s.cname
proxied = true
tags = ["k3s", "monitoring"]
}

proxied = true routes traffic through Cloudflare’s edge (DDoS protection, WAF, caching). The CNAME target is the tunnel’s unique .cfargotunnel.com address, auto-generated by the cloudflare_zero_trust_tunnel_cloudflared resource.


Traefik uses a KEDA ScaledObject with 5 triggers for intelligent autoscaling between 1 and 8 replicas:

hpa/traefik-keda-autoscaling.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: traefik-keda
namespace: traefik
spec:
scaleTargetRef:
name: traefik
pollingInterval: 5
cooldownPeriod: 10
minReplicaCount: 1
maxReplicaCount: 8
triggers:
- type: cpu
metadata:
type: Utilization
value: "50"
- type: memory
metadata:
type: Utilization
value: "75"
- type: prometheus
metadata:
serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring.svc.cluster.local:9090
metricName: traefik_open_connections
threshold: "1000"
query: sum(traefik_open_connections{entrypoint="websecure"})
- type: prometheus
metadata:
metricName: traefik_request_duration
threshold: "0.5"
query: histogram_quantile(0.95, sum(rate(traefik_entrypoint_request_duration_seconds_bucket{entrypoint="websecure"}[1m])) by (le))
- type: prometheus
metadata:
metricName: traefik_requests_total
threshold: "1000"
query: sum(rate(traefik_entrypoint_requests_total{entrypoint="websecure"}[1m]))

The Prometheus triggers query Traefik’s own metrics: open connection count, p95 request duration, and request rate. Any single trigger exceeding its threshold causes a scale-up. The 5-second polling interval and 10-second cooldown make it responsive without flapping.


Access logs → Loki (with structured metadata)

Section titled “Access logs → Loki (with structured metadata)”

Traefik writes JSON-formatted access logs to stdout:

args:
- "--accesslog=true"
- "--accesslog.format=json"
- "--accesslog.bufferingsize=100"
- "--accesslog.fields.defaultmode=keep"
- "--accesslog.fields.headers.defaultmode=keep"

The bufferingsize=100 buffers up to 100 log lines before flushing, reducing I/O pressure. fields.defaultmode=keep and fields.headers.defaultmode=keep include all fields and request/response headers in the JSON output — this is what enables the sentinel bot score and other custom headers to appear in the access logs.

The Alloy DaemonSet picks up these logs from the Traefik container’s stdout (via /var/log/pods/), parses the JSON, and sends them to Loki with 11 structured metadata fields (status, downstream_status, router, service, client_ip, real_client_ip, bot_score, request_path, duration, tls_version, user_agent). Dashboards query these SM fields directly instead of using | json full-line parsing, which is 5-10x faster. See the monitoring stack guide for the full Alloy config and the label/metadata split.

args:
- "--tracing.otlp=true"
- "--tracing.otlp.grpc=true"
- "--tracing.otlp.grpc.endpoint=alloy.monitoring.svc.cluster.local:4317"
- "--tracing.otlp.grpc.insecure=true"
- "--tracing.serviceName=traefik"
- "--tracing.sampleRate=1.0"

Traefik sends OTLP traces to the Alloy DaemonSet on each node, which batches and forwards them to Jaeger. 100% sample rate is fine for a homelab — in production you’d want to sample down.

args:
- "--metrics.prometheus=true"
- "--metrics.prometheus.entrypoint=metrics"
- "--metrics.prometheus.addrouterslabels=true"

addrouterslabels=true adds a router label to all metrics, enabling per-IngressRoute dashboards and alerting. The metrics endpoint is scraped by a ServiceMonitor in the monitoring stack.


Terminal window
# 1. Disable built-in Traefik (one-time)
ansible-playbook -i inventory.yml \
ansible-playbooks/my-playbooks/disable-builtin-traefik.yml \
--become --ask-become-pass
# 2. Apply CRDs
kubectl apply -f crds/kubernetes-crd-definition-v1.yml --server-side
kubectl apply -f crds/kubernetes-crd-rbac.yml
# 3. Apply TLS options
kubectl apply -f middleware/tls-options.yaml
# 4. Deploy plugin ConfigMaps
kubectl apply -f middleware/decompress-configmap.yaml
kubectl apply -f middleware/sentinel-configmap.yaml
# 5. Deploy middleware CRDs
kubectl apply -f middleware/sentinel-middleware.yaml
kubectl apply -f middleware/security-headers.yaml
kubectl apply -f middleware/crowdsec-bouncer.yaml
kubectl apply -f middleware/decompress-middleware.yaml
kubectl apply -f middleware/authentik-forward-auth.yaml
kubectl apply -f middleware/inflight-req.yaml
kubectl apply -f middleware/retry.yaml
kubectl apply -f middleware/rate-limits.yaml
# 6. Deploy Traefik (includes SA, ClusterRole, Service, Deployment, IngressClass, PDB)
kubectl apply -f services/traefik.yaml
# 7. Deploy IngressRoutes
kubectl apply -f ingressroutes/
# 8. Deploy KEDA autoscaling
kubectl apply -f hpa/traefik-keda-autoscaling.yaml
# 9. Apply DNS + tunnel config (OpenTofu)
cd cloudflare-tunnel-tf/ && tofu apply
Terminal window
# Traefik pods running on different nodes
kubectl get pods -n traefik -o wide
# All middlewares loaded
kubectl get middlewares.traefik.io -n traefik
# TLS option active
kubectl get tlsoptions.traefik.io -A
# IngressRoutes across all namespaces
kubectl get ingressroutes.traefik.io -A
# KEDA ScaledObject active
kubectl get scaledobject -n traefik
# Test bot detection (should return 403)
curl -s -o /dev/null -w "%{http_code}" -H "User-Agent: sqlmap/1.0" https://httpbun-k3s.example.com/
# Test honeypot path (should return 403)
curl -s -o /dev/null -w "%{http_code}" https://httpbun-k3s.example.com/.env

CrowdSec runs as a DaemonSet in the cluster and feeds into the bouncer middleware documented in Part 5. It can unexpectedly ban legitimate IPs.

A dedicated Go+htmx web application at security-k3s.example.com provides a browser-based alternative to cscli for most CrowdSec operations. Protected by Authentik forward-auth.

FeatureDashboardcscli
View decisions (paginated, sortable)YesYes
View alerts with expandable detailYesYes
Remove individual decisionsYesYes
Create decisions (CF-style rule builder)YesYes (cscli decisions add)
Export/import decisions as JSONYesNo
GeoIP + ASN enrichment (country flags, org)YesNo
IP lookup (decisions + alerts + allowlist check)YesPartial
View sentinel/rate-limit configYesNo
Manage allowlists (read-only + CLI hints)Yes (read)Yes (read+write)
Scenario breakdown with bar chartsYesYes (cscli metrics)

Source: services/security-dashboard/. Deploy: docker builddocker push erfianugrah/security-dashboard:latestkubectl rollout restart. See security-stack.md for full details.

Terminal window
# Via the Security Dashboard:
# Navigate to the IP Lookup page and search
# Or exec into the CrowdSec LAPI pod:
kubectl exec -n crowdsec deploy/crowdsec-lapi -- cscli decisions list
# Check a specific IP:
kubectl exec -n crowdsec deploy/crowdsec-lapi -- cscli decisions list -i <your_ip>

If your own IP gets banned:

Terminal window
# Via the Security Dashboard:
# Click "Remove" on the decision in the Decisions or Policy page
# Or via cscli:
kubectl exec -n crowdsec deploy/crowdsec-lapi -- cscli decisions delete -i <your_ip>

Dashboard false positives (parser-level whitelist)

Section titled “Dashboard false positives (parser-level whitelist)”

Dashboard UIs like Headlamp and Grafana generate 404s by requesting plugin locale files (e.g. /static-plugins/prometheus/locales/en/translation.json). The crowdsecurity/http-probing scenario has a capacity of 10 with 10-second leak speed, so 11+ unique 404 paths in a burst will trigger a ban. The .json extension is not in CrowdSec’s static_ressource list, so these are treated as probing.

The fix is a parser-level whitelist that runs in the s02-enrich stage. Create dashboard-whitelist.yaml:

# dashboard-whitelist.yaml — placed in /etc/crowdsec/parsers/s02-enrich/
name: custom/dashboard-whitelist
description: "Whitelist known dashboard UI false positives"
whitelist:
reason: "known dashboard UI request patterns"
expression:
- >-
evt.Meta.service == 'http'
&& evt.Meta.http_path startsWith '/static-plugins/'
- >-
evt.Meta.service == 'http'
&& evt.Meta.http_path startsWith '/plugins/'
- >-
evt.Meta.service == 'http'
&& evt.Meta.http_path matches '^/api/v[0-9]+/namespaces'
&& evt.Meta.http_status == '404'

To deploy it in a DaemonSet-based CrowdSec agent, embed the file in the agent ConfigMap and copy it with an init container:

# In the agent ConfigMap, add the file as a data key:
data:
dashboard-whitelist.yaml: |
name: custom/dashboard-whitelist
...
# In the agent DaemonSet, add an init container:
initContainers:
- name: init-config
image: busybox:1.36
command: ["sh", "-c"]
args:
- |
cp /config-custom/dashboard-whitelist.yaml \
/etc/crowdsec/parsers/s02-enrich/dashboard-whitelist.yaml
volumeMounts:
- name: crowdsec-config
mountPath: /etc/crowdsec
- name: agent-config
mountPath: /config-custom

After deploying, verify the parser loaded and test with cscli explain:

Terminal window
# Verify parser is listed:
kubectl exec -n crowdsec <agent-pod> -- cscli parsers list | grep dashboard
# Test with a simulated log line:
kubectl exec -n crowdsec <agent-pod> -- cscli explain \
--type traefik \
--log '1.2.3.4 - - [18/Feb/2026:15:30:00 +0000] "GET /static-plugins/prometheus/locales/en/translation.json HTTP/1.1" 404 19 "-" "Mozilla/5.0" 1 "app@docker" "https://10.0.0.1:443" 5ms'
# Expected output should show:
# custom/dashboard-whitelist (~2 [whitelisted])
# parser success, ignored by whitelist (known dashboard UI request patterns)
# Check whitelist metrics over time:
kubectl exec -n crowdsec <agent-pod> -- cscli metrics show whitelists

CrowdSec v1.7 introduced allowlists — persistent IP/CIDR whitelists that prevent matching IPs from being banned. Unlike parser-level whitelists (which prevent log events from being processed), allowlists operate at the decision level and can be managed dynamically.

Terminal window
# Create an allowlist:
kubectl exec -n crowdsec deploy/crowdsec-lapi -- \
cscli allowlists create trusted-ips -d "Trusted IP addresses"
# Add entries:
kubectl exec -n crowdsec deploy/crowdsec-lapi -- \
cscli allowlists add trusted-ips 10.0.0.0/8 -d "Internal network"
# Remove entries:
kubectl exec -n crowdsec deploy/crowdsec-lapi -- \
cscli allowlists remove trusted-ips 10.0.0.0/8
# View allowlists:
kubectl exec -n crowdsec deploy/crowdsec-lapi -- cscli allowlists list

The Security Dashboard shows all allowlists in a read-only view with items, country flags, and expiry dates. Write operations require cscli (the LAPI allowlist endpoints are read-only via machine auth).

If services behind Traefik return 403 Forbidden unexpectedly, check in this order:

  1. CrowdSec decisions — is the client IP banned?
  2. Sentinel plugin — is it flagging the request as a bot? Check the X-Sentinel-Bot-Score header
  3. Cloudflare WAF — check the Cloudflare dashboard for firewall events

The global middleware chain on the websecure entrypoint is: sentinel -> crowdsec-bouncer -> security-headers. A 403 from CrowdSec will be intercepted by the Sentinel plugin’s response interceptor and rendered as a styled HTML error page showing the block source, reason, trace ID, and client IP.


services/
traefik.yaml # SA, ClusterRole, Service, Deployment, IngressClass, PDB
crds/
kubernetes-crd-definition-v1.yml # Traefik CRDs (~3.5 MB)
kubernetes-crd-rbac.yml # ClusterRole for CRD provider
middleware/
# Local plugins (source + ConfigMap + Middleware CRD)
sentinel-plugin/
sentinel.go # 435-line Go source
go.mod / .traefik.yml
sentinel-configmap.yaml # ConfigMap packaging for k8s
sentinel-middleware.yaml # Middleware CRD with config
decompress-plugin/
decompress.go # 71-line Go source
go.mod / .traefik.yml
decompress-configmap.yaml # ConfigMap packaging for k8s
decompress-middleware.yaml # Middleware CRD (in monitoring ns)
# Remote plugin
crowdsec-bouncer.yaml # SOPS-encrypted Middleware CRD
# Global middlewares
security-headers.yaml # HSTS, nosniff, permissions policy
tls-options.yaml # TLSOption (min TLS 1.2, AEAD ciphers, sniStrict)
# Shared per-route middlewares
rate-limits.yaml # 22 per-route rate limit middlewares (rl-*)
inflight-req.yaml # 100 concurrent req/IP
retry.yaml # 3 attempts, 100ms backoff
# Auth
authentik-forward-auth.yaml # Forward auth to Authentik (in authentik ns)
ingressroutes/
alertmanager-ingress.yaml # monitoring
alloy-logpush-ingress.yaml # monitoring (+ decompress middleware)
argocd-ingress.yaml # argocd (2 routes: HTTP + gRPC)
authentik-ingress.yaml # authentik
dendrite-ingress.yaml # dendrite
grafana-ingress.yaml # monitoring
httpbun-ingress.yaml # httpbun
jaeger-ingress.yaml # monitoring (+ authentik-forward-auth)
longhorn-ingress.yaml # longhorn-system
portainer-agent-ingress.yaml # portainer
portainer-ingress.yaml # portainer
prometheus-ingress.yaml # monitoring
revista-ingress.yaml # revista
traefik-dashboard-ingress.yaml # traefik (api@internal)
traefik-prometheus-ingress.yaml # traefik (prometheus@internal)
services/*/ingress.yaml # Service-embedded IngressRoutes
security-dashboard/manifests.yaml # SA, RBAC, Secret, Deployment, Service, IngressRoute
headlamp/ingressroute.yaml
jitsi/ingress.yaml # 2 routes: Referer-gated + direct
livekit/ingress.yaml # 2 routes + stripPrefix middlewares
matrix/ingress.yaml # 3 routes: Element, Synapse Admin, Synapse
maubot/ingress.yaml
services/crowdsec/
agent-configmap.yaml # Agent config + custom parsers
agent-daemonset.yaml # Agent DaemonSet (init container copies parsers)
lapi-deployment.yaml # LAPI server deployment
lapi-configmap.yaml # LAPI configuration
scenarios-configmap.yaml # Custom detection scenarios
services/security-dashboard/
main.go # Go+htmx dashboard (~2300 lines, zero deps)
manifests.yaml # SOPS-encrypted (ns, SA, RBAC, secret, deploy, svc, ingress)
Dockerfile # Multi-stage ARM64 build
ui/ # Templates + static assets (go:embed)
hpa/
traefik-keda-autoscaling.yaml # 5-trigger ScaledObject (1-8 replicas)
pvc-claims/
traefik-ssl-pvc.yaml # 2Gi NFS PVC for ACME cert storage
cloudflare-tunnel-tf/
tunnel_config.tf # Tunnel ingress rules (hostname → Traefik)
records.tf # DNS CNAME records → tunnel
ansible-playbooks/my-playbooks/
disable-builtin-traefik.yml # Disables k3s built-in Traefik + ServiceLB