Traefik on k3s: Custom Deployment, Plugins, Middlewares, and Cloudflare Tunnel
A complete guide to replacing k3s’s built-in Traefik with a fully custom deployment on a 4-node ARM64 homelab cluster. The built-in Traefik is fine for simple setups, but it doesn’t support local plugins, has limited middleware configuration, and doesn’t expose the level of control needed for things like bot detection, request body decompression, or per-route rate limiting.
This guide covers the full setup: disabling the built-in Traefik, deploying a custom one as a raw Deployment manifest, writing and packaging Traefik Go plugins as ConfigMaps, configuring the middleware chain, managing TLS certificates via Cloudflare DNS challenge, routing traffic through Cloudflare Tunnel, autoscaling with KEDA, and piping access logs + traces into the monitoring stack.
Architecture Overview
Section titled “Architecture Overview”All HTTP traffic enters through Cloudflare’s edge network, passes through a Cloudflare Tunnel (cloudflared running in the cluster), and hits the custom Traefik deployment in the traefik namespace. Traefik terminates TLS (ACME certs via Cloudflare DNS challenge), runs the global middleware chain (sentinel → security-headers), then routes to per-route middlewares and backend services.
Component versions
Section titled “Component versions”| Component | Version | Image |
|---|---|---|
| Traefik | v3.6.8 | traefik:v3.6.8 |
| cloudflared | 2026.2.0 | cloudflare/cloudflared:2026.2.0 |
| KEDA | (cluster-wide) | (already deployed) |
Part 1: Disabling the Built-in Traefik
Section titled “Part 1: Disabling the Built-in Traefik”k3s ships with Traefik as a bundled Helm chart. It auto-deploys on the server node and manages its own CRDs. To run a custom Traefik, the built-in one must be fully disabled — otherwise you get two Traefik instances fighting over the same IngressRoutes.
Ansible playbook
Section titled “Ansible playbook”---- name: Disable k3s built-in Traefik and ServiceLB on server hosts: server become: yes tasks: - name: Add disable directives to k3s config.yaml ansible.builtin.blockinfile: path: /etc/rancher/k3s/config.yaml marker: "# {mark} ANSIBLE MANAGED - disable built-in addons" block: | disable: - traefik - servicelb create: no register: config_changed
- name: Remove k3s bundled traefik manifest files ansible.builtin.file: path: "{{ item }}" state: absent loop: - /var/lib/rancher/k3s/server/manifests/traefik.yaml - /var/lib/rancher/k3s/server/static/charts/traefik-crd-38.0.201+up38.0.2.tgz - /var/lib/rancher/k3s/server/static/charts/traefik-38.0.201+up38.0.2.tgz register: manifests_removed
- name: Restart k3s to pick up config change ansible.builtin.systemd: name: k3s state: restarted daemon_reload: yes when: config_changed.changed
- name: Wait for k3s API to be ready after restart ansible.builtin.wait_for: port: 6443 host: "{{ ansible_host }}" delay: 10 timeout: 120 when: config_changed.changedRun it:
ansible-playbook -i inventory.yml \ ansible-playbooks/my-playbooks/disable-builtin-traefik.yml \ --become --ask-become-passSafe to re-run (idempotent). The playbook also removes stale chart tarballs from k3s’s static manifests directory — without this, k3s may re-deploy the built-in Traefik on restart even with disable set.
Part 2: CRDs and RBAC
Section titled “Part 2: CRDs and RBAC”Traefik’s Kubernetes CRD provider needs its own CRD definitions (IngressRoute, Middleware, TLSOption, etc.) and RBAC permissions. These are separate from the Traefik Deployment itself and must be applied first.
# Apply Traefik CRDs (one-time, or on Traefik version upgrades)kubectl apply -f crds/kubernetes-crd-definition-v1.yml --server-sidekubectl apply -f crds/kubernetes-crd-rbac.ymlThe CRD file is large (~3.5 MB) and requires --server-side due to the annotation size limit. RBAC grants the Traefik ServiceAccount read access to IngressRoutes, Middlewares, TLSOptions, Services, Secrets, EndpointSlices, and related resources across both traefik.io and the legacy traefik.containo.us API groups.
Part 3: The Traefik Deployment
Section titled “Part 3: The Traefik Deployment”The entire Traefik deployment lives in a single manifest: services/traefik.yaml. It contains a ServiceAccount, ClusterRole, ClusterRoleBinding, LoadBalancer Service, Deployment, IngressClass, and PodDisruptionBudget.
Entrypoints
Section titled “Entrypoints”Five entrypoints handle different traffic types:
| Entrypoint | Address | Protocol | Purpose |
|---|---|---|---|
web | :8000/tcp | HTTP | Redirect to HTTPS (unused behind tunnel) |
websecure | :8443 | HTTPS + HTTP/3 + QUIC | All production traffic |
metrics | :8082/tcp | HTTP | Prometheus metrics scrape endpoint |
traefik | :9000/tcp | HTTP | Dashboard API + health checks (/ping) |
jvb-udp | :10000/udp | UDP | Jitsi Videobridge media |
The websecure entrypoint is the workhorse. Key settings:
args: - "--entrypoints.websecure.address=:8443" - "--entrypoints.websecure.http.tls=true" - "--entrypoints.websecure.http.tls.certResolver=cloudflare" - "--entrypoints.websecure.http3=true" - "--entrypoints.websecure.http3.advertisedport=443" - "--entrypoints.websecure.http2.maxConcurrentStreams=512" # Global middlewares applied to ALL websecure requests - "--entrypoints.websecure.http.middlewares=traefik-sentinel@kubernetescrd,traefik-security-headers@kubernetescrd"HTTP/3 is enabled with advertisedport=443 because the container listens on 8443 but the LoadBalancer Service maps port 443 → 8443. Without the advertised port, clients would try QUIC on port 8443 and fail.
Timeouts
Section titled “Timeouts”args: - "--entrypoints.websecure.transport.respondingTimeouts.readTimeout=60s" - "--entrypoints.websecure.transport.respondingTimeouts.writeTimeout=0s" - "--entrypoints.websecure.transport.respondingTimeouts.idleTimeout=180s" - "--entrypoints.websecure.transport.lifeCycle.graceTimeOut=30s" - "--entrypoints.websecure.transport.lifeCycle.requestAcceptGraceTimeout=5s"writeTimeout=0s (disabled) is intentional. Matrix (Synapse), Jitsi, and LiveKit all use long-lived WebSocket connections. A non-zero write timeout would kill WebSocket connections that don’t send data within the timeout window. The tradeoff is that slowloris-style attacks against WebSocket endpoints aren’t mitigated at the Traefik layer — but Sentinel’s tarpit action and Cloudflare’s DDoS protection handle that upstream.
Forwarded headers
Section titled “Forwarded headers”args: - "--entrypoints.websecure.forwardedHeaders.trustedIPs=173.245.48.0/20,103.21.244.0/22,..."All Cloudflare IPv4 and IPv6 ranges are listed as trusted IPs. This tells Traefik to trust X-Forwarded-For headers from these IPs, which is necessary because Cloudflare Tunnel connects from Cloudflare edge IPs. Without this, X-Forwarded-For would be stripped and the sentinel plugin would see the cloudflared pod IP instead of the real client IP.
Go runtime tuning
Section titled “Go runtime tuning”env: - name: GOMAXPROCS value: "2" - name: GOMEMLIMIT value: "900MiB"On ARM64 homelab nodes with 4 cores, limiting GOMAXPROCS to 2 prevents Traefik from consuming all CPU cores. GOMEMLIMIT at 900MiB (with a 1024Mi limit) gives the Go GC a soft target to aim for, reducing OOM kills from GC pressure spikes.
Security context
Section titled “Security context”securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL readOnlyRootFilesystem: trueThe root filesystem is read-only. Writable paths are provided via volume mounts: /ssl-certs-2 (PVC for ACME certs), /tmp (emptyDir), /plugins-local/ (ConfigMap mounts for plugins), /plugins-storage (emptyDir for remote plugin cache), /blocklists (ConfigMap for IPsum blocklist).
Pod anti-affinity and PDB
Section titled “Pod anti-affinity and PDB”affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchLabels: app.kubernetes.io/name: traefik topologyKey: kubernetes.io/hostnameWith 2 replicas, the anti-affinity preference spreads them across different nodes. It’s preferred not required because on a 4-node cluster with other workloads, there might not always be two nodes available.
The PDB ensures at least 1 replica is always available during voluntary disruptions (node drains, rolling updates):
apiVersion: policy/v1kind: PodDisruptionBudgetmetadata: name: traefik-pdb namespace: traefikspec: minAvailable: 1Graceful shutdown
Section titled “Graceful shutdown”lifecycle: preStop: exec: command: ["sh", "-c", "sleep 10"]The 10-second pre-stop sleep gives the Service endpoints time to de-register from kube-proxy before the pod starts shutting down. Without this, in-flight requests can hit a pod that’s already draining.
Part 4: Custom Local Plugins
Section titled “Part 4: Custom Local Plugins”Traefik supports two types of plugins: remote (fetched from GitHub on startup) and local (mounted from the filesystem). Local plugins use Traefik’s Yaegi Go interpreter — you write standard Go code, and Traefik interprets it at runtime. No compilation step needed.
How local plugins work
Section titled “How local plugins work”- Plugin source goes into
/plugins-local/src/<moduleName>/inside the Traefik container - The module must have
go.mod,.traefik.yml, and the Go source file - Traefik is told about the plugin via
--experimental.localPlugins.<name>.moduleName=<moduleName> - A Middleware CRD references the plugin by name under
spec.plugin.<name>
Since Traefik runs with readOnlyRootFilesystem: true, the plugin files are packaged as ConfigMaps and mounted as volumes.
Plugin 1: Sentinel (bot detection + IP resolution + IPsum blocklist + rule engine)
Section titled “Plugin 1: Sentinel (bot detection + IP resolution + IPsum blocklist + rule engine)”Sentinel is a ~1843-line Yaegi local plugin that provides the entire inline security layer. It replaces the standalone realclientip plugin and the previously-used CrowdSec Bouncer remote plugin, combining IP resolution, heuristic bot detection, IPsum threat intelligence blocklist enforcement, and a Cloudflare WAF-inspired expression-based firewall rule engine into a single middleware.
8-step request flow:
- IP Resolution: Resolve real client IP from trusted headers (
Cf-Connecting-Ip> XFF right-to-left > RemoteAddr), setX-Real-Client-Ipheader - GeoIP Country Resolution: Check
Cf-Ipcountryheader first, fall back to GeoIP MMDB lookup (DB-IP free country database), setX-Geo-Countryheader - Allowlist Check: If IP in
allowedIPsconfig → pass immediately (skip all checks) - IPsum Blocklist Check: 19,621+ IPs loaded from
/blocklists/ipsum.txt(CronJob refreshes daily). If IP matched → 403 Forbidden withX-Blocked-By: sentinel-blocklist - Heuristic Bot Scoring: 9 signals accumulate a score per request
- Rule Engine: Expression-based firewall rules evaluated top-to-bottom by priority. First terminating action (allow/block/tarpit) wins; non-terminating actions (score/log/tag) accumulate
- Threshold Check: If cumulative score >=
blockThreshold(100) → 403 Forbidden - Response Intercept: Wraps upstream responses to style error pages with block info
Scoring signals:
| Signal | Score | Rationale |
|---|---|---|
| Scanner UA substring match | +100 | sqlmap, nikto, nuclei, zgrab, etc. — one match is enough to block |
| Honeypot path match | +100 | /.env, /.git/HEAD, /wp-login.php, etc. — no legitimate client requests these |
| Empty User-Agent | +40 | Most real browsers always send UA |
Missing Accept header | +30 | Browsers always send Accept |
| HTTP/1.0 protocol | +25 | Almost no modern client uses HTTP/1.0 |
Missing Accept-Language | +20 | Browsers send this; most bots don’t |
Missing Accept-Encoding | +15 | Browsers send this |
Connection: close with HTTP/1.1 | +10 | Unusual for real clients |
| Per-IP rate exceeded (>30 req/s) | +30 | Sliding window rate tracker per IP |
A request with a known scanner UA (+100) gets blocked immediately. A request with no UA (+40), no Accept (+30), and no Accept-Language (+20) also gets blocked (90 total, but add missing Accept-Encoding at +15 = 105 >= 100). The per-IP rate tracker uses a sliding window with background cleanup to prevent memory leaks.
Rule engine (expression-based firewall):
The rule engine uses a concise expression syntax with short field names, a recursive descent parser, and a tokenizer — all in pure Go stdlib:
path contains "/admin" and country eq "CN"(ip in {1.2.3.4 5.6.7.8/24}) or (ua matches "^curl/")not ip in {10.0.0.0/8} and score ge 80host eq "logpush-k3s.erfi.io" and not header["X-Logpush-Secret"] eq "..."Available fields (short names preferred, long CF-style names still work for backward compat):
| Field | Long alias | Source |
|---|---|---|
ip | ip.src | Resolved client IP |
country | ip.src.country | Cf-Ipcountry header, falls back to GeoIP MMDB (DB-IP free country database) |
host | http.host | Host header |
method | http.request.method | Request method |
path | http.request.uri.path | URI path |
query | http.request.uri.query | Query string |
uri | http.request.uri | Full URI (path + query) |
ua | http.user_agent | User-Agent header |
header["X"] | http.request.headers["X"] | Any header by name |
ssl | Boolean (TLS) | |
score | sentinel.score | Computed bot score |
proto | HTTP protocol version |
Operators: eq, ne, contains, matches (regex), in {set} (IP/CIDR/string), gt, ge, lt, le. Logical: and, or, not, parentheses.
Actions: allow (bypass all, terminates), block (403, terminates), tarpit (slow-drip chunked response, 2s intervals, 5min max, terminates), score:N (add N to bot score, continues), log (log only, continues), tag:name (add header tag, continues).
Rules are stored as a JSON array string in the middleware CRD rules field. Example deployed rules:
| ID | Priority | Expression | Action |
|---|---|---|---|
| r1 | 1 | ip eq "195.240.81.42" | allow (owner IP bypass) |
| r6 | 2 | host eq "logpush-k3s.erfi.io" and not header["X-Logpush-Secret"] eq "..." | block (deny without secret) |
| r2 | 10 | path contains "/.git" and not ip eq "195.240.81.42" | block |
| r3 | 20 | country in {CN RU} | score:30 |
| r4 | 30 | ua matches "^curl/" and header["Accept"] eq "" | block |
| r5 | 100 | score ge 150 | tarpit |
The Security Dashboard provides a guided expression builder UI for creating rules: field dropdown (12 fields including Protocol), operator dropdown (dynamic per field type), value input, AND/OR combinator, NOT toggle per condition, nested condition groups for mixed AND/OR logic (e.g., (a and b) or (c and d)), and condition chips with remove buttons. The builder auto-generates the expression string and supports bidirectional sync (editing existing rules reverse-parses expressions back into the builder, including groups and negated conditions).
Implementation constraints (Yaegi runtime):
- Pure Go stdlib only — no external dependencies, no cgo, no unsafe
- Cannot use
html/template— uses manual string building for HTML error pages - Cannot use Go interfaces for method dispatch — Yaegi panics with
reflect: call of reflect.Value.SetBool on interface Value. The AST uses a singleExprNodestruct withexprKindtype tag and standaloneevalExpr()function instead of anExprinterface with concrete types - Returns
(string, int, bool, error)tuple from eval instead ofinterface{}to avoid Yaegi reflection issues - Manual JSON parser for rules (no
encoding/jsondependency for Yaegi safety)
IPsum blocklist:
IPsum is an open threat intelligence feed aggregating 10+ blocklist sources. A CronJob runs daily, downloads the latest list, and stores it in a ConfigMap mounted into Traefik at /blocklists/ipsum.txt. The plugin loads the blocklist into an in-memory map on startup and reloads periodically (configurable via blocklistReloadSeconds, default 300s). Currently 19,621+ IPs loaded.
The CronJob resources live in services/sentinel/ipsum-cronjob.yaml: ServiceAccount, Role, RoleBinding (ConfigMap write access in traefik namespace), Python script ConfigMap, and the CronJob itself.
GeoIP country lookup:
Sentinel includes a pure Go MMDB reader (~300 lines, stdlib only) for resolving client IPs to ISO country codes without Cloudflare. Resolution order: Cf-Ipcountry header first, then GeoIP MMDB fallback. The resolved country is set as the X-Geo-Country request header on every request (visible in access logs and Loki structured metadata).
The database is DB-IP free country (dbip-country-lite-YYYY-MM.mmdb.gz, ~7MB), downloaded by an init container on pod start and stored in an emptyDir volume at /geoip/country.mmdb. Configure via geoipFile in the middleware CRD. The MMDB reader supports 24/28/32-bit record sizes, IPv4-in-IPv6 subtree caching, and is compatible with both MaxMind GeoLite2 and DB-IP formats.
Packaging as ConfigMap:
The plugin source, go.mod, and .traefik.yml are inlined in a ConfigMap:
# The ConfigMap is generated from middleware/sentinel.goapiVersion: v1kind: ConfigMapmetadata: name: traefik-plugin-sentinel namespace: traefikdata: sentinel.go: | package sentinel // ... (full Go source, ~1843 lines) go.mod: | module github.com/erfianugrah/sentinel go 1.22 .traefik.yml: | displayName: Sentinel type: middleware import: github.com/erfianugrah/sentinel summary: Real client IP resolution + heuristic bot detection + IPsum blocklist + expression-based firewall rules. testData: trustedHeaders: - Cf-Connecting-Ip - X-Forwarded-For # ...Mounted in the Deployment:
volumeMounts: - name: plugin-sentinel mountPath: /plugins-local/src/github.com/erfianugrah/sentinel readOnly: truevolumes: - name: plugin-sentinel configMap: name: traefik-plugin-sentinelEnabled via args:
args: - "--experimental.localPlugins.sentinel.moduleName=github.com/erfianugrah/sentinel"Middleware CRD (applied as a global middleware on the websecure entrypoint):
apiVersion: traefik.io/v1alpha1kind: Middlewaremetadata: name: sentinel namespace: traefikspec: plugin: sentinel: trustedHeaders: - Cf-Connecting-Ip - X-Forwarded-For trustedProxies: - "10.42.0.0/16" # k3s pod CIDR - "10.43.0.0/16" # k3s service CIDR - "173.245.48.0/20" # Cloudflare IPv4 # ... all CF ranges enabled: true blockThreshold: 100 tagThreshold: 60 rateLimitPerSecond: 30 rateLimitWindowSeconds: 10 blocklistFile: "/blocklists/ipsum.txt" blocklistReloadSeconds: 300 allowedIPs: "195.240.81.42" scannerUAs: "sqlmap,nikto,dirbuster,masscan,zgrab,nuclei,httpx,gobuster,ffuf,nmap,whatweb,wpscan,joomla,drupal" honeypotPaths: "/.env,/.git/HEAD,/.git/config,/wp-login.php,/wp-config.php,/wp-admin,/.aws/credentials,/actuator/env,/actuator/health,/xmlrpc.php,/.DS_Store,/config.json,/package.json,/.htaccess,/server-status,/debug/pprof" rules: | [ {"id":"r1","description":"Allow owner IP","expression":"ip.src eq \"195.240.81.42\"","action":"allow","enabled":true,"priority":1}, ... ]The rules field is a JSON array string. Legacy fields (allowedIPs, honeypotPaths, scannerUAs) still work as backward-compatible shortcuts alongside the rule engine.
Plugin 2: Decompress (gzip request body)
Section titled “Plugin 2: Decompress (gzip request body)”The decompress plugin exists for one reason: Cloudflare Logpush always gzip-compresses HTTP payloads, and Alloy’s /loki/api/v1/raw endpoint doesn’t handle Content-Encoding: gzip. Traefik’s built-in compress middleware only handles response compression, not request body decompression.
The plugin is simple — 71 lines of Go:
func (d *Decompress) ServeHTTP(rw http.ResponseWriter, req *http.Request) { encoding := strings.ToLower(req.Header.Get("Content-Encoding")) if encoding != "gzip" { d.next.ServeHTTP(rw, req) return }
gzReader, err := gzip.NewReader(req.Body) if err != nil { http.Error(rw, fmt.Sprintf("failed to create gzip reader: %v", err), http.StatusBadRequest) return } defer gzReader.Close()
decompressed, err := io.ReadAll(gzReader) if err != nil { http.Error(rw, fmt.Sprintf("failed to decompress body: %v", err), http.StatusBadRequest) return }
req.Body = io.NopCloser(bytes.NewReader(decompressed)) req.ContentLength = int64(len(decompressed)) req.Header.Set("Content-Length", strconv.Itoa(len(decompressed))) req.Header.Del("Content-Encoding")
d.next.ServeHTTP(rw, req)}Same ConfigMap packaging pattern as sentinel. The decompress middleware CRD lives in the monitoring namespace (same as the Alloy Logpush IngressRoute that uses it):
apiVersion: traefik.io/v1alpha1kind: Middlewaremetadata: name: decompress namespace: monitoringspec: plugin: decompress: {}Published at github.com/erfianugrah/decompress.
Part 5: Global Middlewares
Section titled “Part 5: Global Middlewares”Two middlewares are applied globally to every request on the websecure entrypoint via the --entrypoints.websecure.http.middlewares flag:
- "--entrypoints.websecure.http.middlewares=traefik-sentinel@kubernetescrd,traefik-security-headers@kubernetescrd"The format is <namespace>-<name>@kubernetescrd. Order matters — sentinel runs first (resolves IP, checks blocklist, scores request, evaluates rules), then security-headers adds HSTS and other response headers.
Security headers
Section titled “Security headers”apiVersion: traefik.io/v1alpha1kind: Middlewaremetadata: name: security-headers namespace: traefikspec: headers: stsSeconds: 63072000 # HSTS 2 years stsIncludeSubdomains: true stsPreload: true contentTypeNosniff: true referrerPolicy: "strict-origin-when-cross-origin" permissionsPolicy: "camera=(), microphone=(), geolocation=(), payment=()" customResponseHeaders: Server: "" # Strip server identity X-Powered-By: ""frameDeny, browserXssFilter, and CSP are intentionally omitted from the global middleware. These are app-specific — Authentik needs its own CSP, Grafana needs iframe support for embedding, etc. Apply those per-route where needed.
Part 6: TLS Configuration
Section titled “Part 6: TLS Configuration”TLSOption
Section titled “TLSOption”apiVersion: traefik.io/v1alpha1kind: TLSOptionmetadata: name: default namespace: defaultspec: minVersion: VersionTLS12 maxVersion: VersionTLS13 cipherSuites: # TLS 1.2 only -- TLS 1.3 ciphers are not configurable in Go (all safe by default) - TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 - TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 - TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256 - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 - TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256 - TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256 curvePreferences: - X25519 - CurveP256 sniStrict: true alpnProtocols: - h2 - http/1.1The TLSOption must be named default in the default namespace for Traefik to pick it up as the default TLS configuration. All cipher suites are AEAD-only (GCM or ChaCha20-Poly1305) — no CBC mode. sniStrict: true rejects connections that don’t present a valid SNI hostname matching a known route.
ACME via Cloudflare DNS challenge
Section titled “ACME via Cloudflare DNS challenge”args: - "--certificatesresolvers.cloudflare.acme.dnschallenge.provider=cloudflare" - "--certificatesresolvers.cloudflare.acme.email=erfi.anugrah@gmail.com" - "--certificatesresolvers.cloudflare.acme.dnschallenge.resolvers=1.1.1.1" - "--certificatesresolvers.cloudflare.acme.storage=/ssl-certs-2/acme-cloudflare.json"The CF_DNS_API_TOKEN env var is pulled from a Kubernetes Secret (cloudflare-credentials). The ACME cert storage lives on an NFS PVC (traefik-ssl-2, 2Gi, RWX) so certs survive pod restarts and don’t trigger Let’s Encrypt rate limits on every rollout.
apiVersion: v1kind: PersistentVolumeClaimmetadata: name: traefik-ssl-2 namespace: traefikspec: accessModes: [ReadWriteMany] resources: requests: storage: 2Gi storageClassName: nfs-clientPart 7: Per-Route Middlewares
Section titled “Part 7: Per-Route Middlewares”Rate limiting (per-route isolation)
Section titled “Rate limiting (per-route isolation)”Each service gets its own rate limit middleware to prevent cross-service token bucket interference. The problem this solves: when multiple services share a single rate-limit-api middleware, Traefik maintains one token bucket per source IP per middleware instance. All routes sharing that middleware share the same bucket. Authentik OAuth flows generate 35+ requests in bursts (redirects, consent, callback, static assets), which would exceed a shared 10 req/s bucket and return 429s.
All per-route rate limit middlewares live in a single file:
# middleware/rate-limits.yaml (pattern -- 22 middlewares total)apiVersion: traefik.io/v1alpha1kind: Middlewaremetadata: name: rl-authentik namespace: traefikspec: rateLimit: average: 100 period: 1s burst: 500 sourceCriterion: requestHeaderName: X-Real-Client-IpsourceCriterion.requestHeaderName: X-Real-Client-Ip uses the header set by the sentinel plugin for per-IP bucketing. Without this, Traefik would use the connection source IP, which behind Cloudflare Tunnel is always the cloudflared pod IP — meaning all users would share one bucket.
Rate limits for monitoring/query services (Grafana, Prometheus, Alertmanager, Jaeger, Logpush, Traefik Dashboard, Traefik Prometheus) are currently commented out in their IngressRoutes. These services generate heavy internal query traffic (Grafana fires dozens of parallel Loki queries when loading dashboards), and rate limiting them causes query timeouts.
Managing rate limits via the Security Dashboard:
The Security Dashboard’s Rate Limits page provides a web UI for managing all 22 rl-* middleware CRDs without kubectl:
- Inline editing: click any value (average, burst, period) in the table to edit it in-place. Saves are instant via
kubectl patch(strategic merge patch) on the middleware CRD - Create: modal form to create a new
rl-{name}middleware CRD with configurable average, burst, period, and source criterion - Delete: removes the middleware CRD entirely (with confirmation dialog)
The dashboard’s ClusterRole has get, list, watch, patch, update, create, delete permissions for middlewares in the traefik.io API group.
# Equivalent kubectl commands for referencekubectl get middlewares.traefik.io -n traefik -l app!=sentinel | grep "^rl-"kubectl patch middleware rl-grafana -n traefik --type merge \ -p '{"spec":{"rateLimit":{"average":200,"burst":1000}}}'In-flight request limiting
Section titled “In-flight request limiting”apiVersion: traefik.io/v1alpha1kind: Middlewaremetadata: name: inflight-req namespace: traefikspec: inFlightReq: amount: 100 sourceCriterion: requestHeaderName: X-Real-Client-IpLimits concurrent connections per source IP to 100. Unlike rate limiting (which controls request rate), this controls concurrency. A single IP can’t monopolize all backend connections. Shared across all routes — this is fine because the limit is per-IP, not per-route.
apiVersion: traefik.io/v1alpha1kind: Middlewaremetadata: name: retry namespace: traefikspec: retry: attempts: 3 initialInterval: 100ms3 attempts total (1 initial + 2 retries) with exponential backoff starting at 100ms. Only retries on connection errors, NOT on non-2xx status codes. Also shared across all routes.
Authentik forward auth
Section titled “Authentik forward auth”apiVersion: traefik.io/v1alpha1kind: Middlewaremetadata: name: authentik-forward-auth namespace: authentikspec: forwardAuth: address: http://authentik-server.authentik.svc.cluster.local/outpost.goauthentik.io/auth/traefik trustForwardHeader: true authResponseHeaders: - X-authentik-username - X-authentik-groups - X-authentik-entitlements - X-authentik-email - X-authentik-name - X-authentik-uid - X-authentik-jwt - X-authentik-meta-jwks - X-authentik-meta-outpost - X-authentik-meta-provider - X-authentik-meta-app - X-authentik-meta-versionApplied per-route to services that need SSO protection (Jaeger UI, etc.). Traefik forwards a sub-request to Authentik’s embedded outpost; if Authentik returns 200, the original request proceeds with the X-authentik-* headers injected. If 401/403, the user is redirected to the Authentik login flow.
Part 8: IngressRoutes
Section titled “Part 8: IngressRoutes”20+ IngressRoutes route traffic from hostnames to backend services. Each IngressRoute specifies its middleware chain. The middleware execution order is: global middlewares first (sentinel → security-headers), then per-route middlewares in the order listed.
Middleware assignments
Section titled “Middleware assignments”| Route | Host | Middlewares | Namespace |
|---|---|---|---|
| Grafana | grafana-k3s.example.com | monitoring | |
| Prometheus | prom-k3s.example.com | monitoring | |
| Alertmanager | alertmanager-k3s.example.com | monitoring | |
| Jaeger | jaeger-k3s.example.com | monitoring | |
| Logpush | logpush-k3s.example.com | monitoring | |
| Traefik Dashboard | traefik-dashboard.example.com | traefik | |
| Traefik Prometheus | traefik-prometheus.example.com | traefik | |
| Authentik | authentik.example.com | authentik | |
| Revista | mydomain.com | rl-revista, inflight-req, retry | revista |
| ArgoCD (HTTP) | argocd.example.com | rl-argocd, inflight-req, retry | argocd |
| ArgoCD (gRPC) | argocd.example.com + gRPC header | rl-argocd, inflight-req, retry | argocd |
| Dendrite | dendrite.example.com | rl-dendrite, inflight-req, retry | dendrite |
| httpbun | httpbun-k3s.example.com | rl-httpbun, inflight-req, retry | httpbun |
| Jitsi (from Element) | jitsi.example.com + Referer match | rl-jitsi, inflight-req, retry | jitsi |
| Jitsi (direct) | jitsi.example.com | rl-jitsi, inflight-req, retry | jitsi |
| LiveKit JWT | matrix-rtc.example.com/livekit/jwt | rl-livekit, inflight-req, strip-livekit-jwt, retry | livekit |
| LiveKit SFU | matrix-rtc.example.com/livekit/sfu | rl-livekit, inflight-req, strip-livekit-sfu, retry | livekit |
| Element (chat) | chat.example.com | rl-matrix-element, inflight-req, retry | matrix |
| Synapse Admin | admin.matrix.example.com | rl-matrix-admin, inflight-req, retry | matrix |
| Synapse | matrix.example.com | rl-matrix-synapse, inflight-req, retry | matrix |
| Maubot | maubot.example.com | rl-maubot, inflight-req, retry | maubot |
| Headlamp | headlamp-k3s.example.com | rl-headlamp, inflight-req, retry | headlamp |
| Longhorn | longhorn.example.com | rl-longhorn, inflight-req, retry | longhorn-system |
| Portainer | portainer-k3s.example.com | rl-portainer, inflight-req, retry | portainer |
| Portainer Agent | port-agent-k3s.example.com | rl-portainer-agent, inflight-req, retry | portainer |
| Security Dashboard | security-k3s.example.com | authentik-forward-auth | security-dashboard |
Strikethrough (~~) indicates rate limits that are currently commented out.
Part 9: Cloudflare Tunnel Integration
Section titled “Part 9: Cloudflare Tunnel Integration”All external traffic enters the cluster through a Cloudflare Tunnel. The tunnel connects from a cloudflared Deployment inside the cluster to Cloudflare’s edge network via outbound QUIC connections — no inbound ports or public IPs needed.
How it works
Section titled “How it works”cloudflaredruns in thecloudflarednamespace, maintains 4 HA connections to Cloudflare edge- DNS CNAME records point each hostname to the tunnel’s
.cfargotunnel.comaddress - Cloudflare edge receives the request, looks up the tunnel config, and forwards to
cloudflared cloudflaredroutes to the Traefik Service based on hostname matching in the tunnel ingress rules- Traefik handles TLS termination, middleware, and routing to the backend
Tunnel ingress rules (OpenTofu)
Section titled “Tunnel ingress rules (OpenTofu)”Each hostname maps to the Traefik Service’s cluster-internal HTTPS endpoint:
ingress_rule { hostname = "grafana-k3s.${var.secondary_domain_name}" service = "https://traefik.traefik.svc.cluster.local" origin_request { origin_server_name = "grafana-k3s.${var.secondary_domain_name}" http2_origin = true }}origin_server_name is set to the actual hostname so cloudflared presents the correct SNI to Traefik. http2_origin = true enables HTTP/2 between cloudflared and Traefik, which is needed for gRPC (ArgoCD) and improves multiplexing.
Each service that needs external access gets its own ingress rule. For example, the security dashboard:
ingress_rule { hostname = "security-k3s.${var.secondary_domain_name}" service = "http://security-dashboard.security-dashboard.svc.cluster.local" origin_request { origin_server_name = "security-k3s.${var.secondary_domain_name}" http2_origin = true }}The catch-all rule at the bottom returns 404 for unrecognized hostnames:
ingress_rule { service = "http_status:404"}DNS records (OpenTofu)
Section titled “DNS records (OpenTofu)”Each service gets a CNAME record pointing to the tunnel:
resource "cloudflare_record" "grafana-k3s" { zone_id = var.cloudflare_secondary_zone_id name = "grafana-k3s" type = "CNAME" content = cloudflare_zero_trust_tunnel_cloudflared.k3s.cname proxied = true tags = ["k3s", "monitoring"]}proxied = true routes traffic through Cloudflare’s edge (DDoS protection, WAF, caching). The CNAME target is the tunnel’s unique .cfargotunnel.com address, auto-generated by the cloudflare_zero_trust_tunnel_cloudflared resource.
Part 10: KEDA Autoscaling
Section titled “Part 10: KEDA Autoscaling”Traefik uses a KEDA ScaledObject with 5 triggers for intelligent autoscaling between 1 and 8 replicas:
apiVersion: keda.sh/v1alpha1kind: ScaledObjectmetadata: name: traefik-keda namespace: traefikspec: scaleTargetRef: name: traefik pollingInterval: 5 cooldownPeriod: 10 minReplicaCount: 1 maxReplicaCount: 8 triggers: - type: cpu metadata: type: Utilization value: "50" - type: memory metadata: type: Utilization value: "75" - type: prometheus metadata: serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring.svc.cluster.local:9090 metricName: traefik_open_connections threshold: "1000" query: sum(traefik_open_connections{entrypoint="websecure"}) - type: prometheus metadata: metricName: traefik_request_duration threshold: "0.5" query: histogram_quantile(0.95, sum(rate(traefik_entrypoint_request_duration_seconds_bucket{entrypoint="websecure"}[1m])) by (le)) - type: prometheus metadata: metricName: traefik_requests_total threshold: "1000" query: sum(rate(traefik_entrypoint_requests_total{entrypoint="websecure"}[1m]))The Prometheus triggers query Traefik’s own metrics: open connection count, p95 request duration, and request rate. Any single trigger exceeding its threshold causes a scale-up. The 5-second polling interval and 10-second cooldown make it responsive without flapping.
Part 11: Observability
Section titled “Part 11: Observability”Access logs → Loki (with structured metadata)
Section titled “Access logs → Loki (with structured metadata)”Traefik writes JSON-formatted access logs to stdout:
args: - "--accesslog=true" - "--accesslog.format=json" - "--accesslog.bufferingsize=100" - "--accesslog.fields.defaultmode=keep" - "--accesslog.fields.headers.defaultmode=keep"The bufferingsize=100 buffers up to 100 log lines before flushing, reducing I/O pressure. fields.defaultmode=keep and fields.headers.defaultmode=keep include all fields and request/response headers in the JSON output — this is what enables the sentinel bot score, block reason, and other custom headers to appear in the access logs.
The Alloy DaemonSet picks up these logs from the Traefik container’s stdout (via /var/log/pods/), parses the JSON, and sends them to Loki with 19 structured metadata fields:
| Field | Source | Purpose |
|---|---|---|
status | DownstreamStatus | HTTP response status code |
downstream_status | DownstreamStatus | Same (for compatibility) |
router | RouterName | Traefik router that handled the request |
service | ServiceName | Backend service |
client_ip | ClientHost | Direct connection source (usually cloudflared pod) |
real_client_ip | request_X-Real-Client-Ip | Actual client IP (set by sentinel) |
bot_score | request_X-Bot-Score | Sentinel bot score |
blocked_by | request_X-Blocked-By | Block source (sentinel-rule, sentinel-blocklist, etc.) |
country | request_X-Geo-Country | Client country code (Cf-Ipcountry → GeoIP MMDB fallback) |
cf_connecting_ip | request_Cf-Connecting-Ip | Cloudflare’s client IP header |
request_host | RequestHost | Host header value |
request_path | RequestPath | URI path |
request_protocol | RequestProtocol | HTTP/1.1, HTTP/2.0, etc. |
duration | Duration | Total request duration |
origin_duration | OriginDuration | Backend response time |
overhead | Overhead | Traefik processing overhead |
downstream_size | DownstreamContentSize | Response body size |
tls_version | TLSVersion | TLS 1.2 or 1.3 |
user_agent | request_User-Agent | Client User-Agent |
Labels (low cardinality): entrypoint, method, job="traefik-access-log"
Dashboards query these structured metadata fields directly instead of using | json full-line parsing, which is 5-10x faster (14ms vs 30s+ response times).
Additionally, Alloy generates 7 Prometheus counters via stage.metrics, categorized by Sentinel block type:
| Counter | Match condition |
|---|---|
loki_process_custom_traefik_access_requests_total | All requests (match_all = true) |
loki_process_custom_traefik_access_sentinel_blocks_total | blocked_by = "sentinel" (bot scoring threshold) |
loki_process_custom_traefik_access_blocklist_blocks_total | blocked_by = "sentinel-blocklist" (IPsum blocklist) |
loki_process_custom_traefik_access_ratelimit_blocks_total | blocked_by = "rate-limit" (per-IP rate limit) |
loki_process_custom_traefik_access_sentinel_rule_blocks_total | blocked_by = "sentinel-rule" (firewall rule engine) |
loki_process_custom_traefik_access_tarpit_blocks_total | blocked_by = "sentinel-tarpit" (tarpit action) |
loki_process_custom_traefik_access_403_total | downstream_status = "403" (all 403s regardless of source) |
The source field in stage.metrics reads from the extracted data map populated by stage.json, matching the blocked_by and downstream_status JSON keys. These counters power the Security Dashboard’s instant-loading aggregate statistics and the Grafana Traefik Access Logs dashboard’s security section without querying Loki.
See the monitoring stack guide for the full Alloy config and the label/metadata split.
Tracing → Jaeger (via Alloy)
Section titled “Tracing → Jaeger (via Alloy)”args: - "--tracing.otlp=true" - "--tracing.otlp.grpc=true" - "--tracing.otlp.grpc.endpoint=alloy.monitoring.svc.cluster.local:4317" - "--tracing.otlp.grpc.insecure=true" - "--tracing.serviceName=traefik" - "--tracing.sampleRate=1.0"Traefik sends OTLP traces to the Alloy DaemonSet on each node, which batches and forwards them to Jaeger. 100% sample rate is fine for a homelab — in production you’d want to sample down.
Metrics → Prometheus
Section titled “Metrics → Prometheus”args: - "--metrics.prometheus=true" - "--metrics.prometheus.entrypoint=metrics" - "--metrics.prometheus.addrouterslabels=true"addrouterslabels=true adds a router label to all metrics, enabling per-IngressRoute dashboards and alerting. The metrics endpoint is scraped by a ServiceMonitor in the monitoring stack.
Deployment
Section titled “Deployment”Full deploy sequence
Section titled “Full deploy sequence”# 1. Disable built-in Traefik (one-time)ansible-playbook -i inventory.yml \ ansible-playbooks/my-playbooks/disable-builtin-traefik.yml \ --become --ask-become-pass
# 2. Apply CRDskubectl apply -f crds/kubernetes-crd-definition-v1.yml --server-sidekubectl apply -f crds/kubernetes-crd-rbac.yml
# 3. Apply TLS optionskubectl apply -f middleware/tls-options.yaml
# 4. Deploy plugin ConfigMapskubectl apply -f middleware/decompress-configmap.yamlkubectl apply -f middleware/sentinel-configmap.yaml
# 5. Deploy middleware CRDskubectl apply -f middleware/sentinel-middleware.yamlkubectl apply -f middleware/security-headers.yamlkubectl apply -f middleware/decompress-middleware.yamlkubectl apply -f middleware/authentik-forward-auth.yamlkubectl apply -f middleware/inflight-req.yamlkubectl apply -f middleware/retry.yamlkubectl apply -f middleware/rate-limits.yaml
# 6. Deploy IPsum blocklist CronJobkubectl apply -f services/sentinel/ipsum-cronjob.yaml
# 7. Deploy Traefik (includes SA, ClusterRole, Service, Deployment, IngressClass, PDB)kubectl apply -f services/traefik.yaml
# 8. Deploy IngressRouteskubectl apply -f ingressroutes/
# 9. Deploy KEDA autoscalingkubectl apply -f hpa/traefik-keda-autoscaling.yaml
# 10. Apply DNS + tunnel config (OpenTofu)cd cloudflare-tunnel-tf/ && tofu applyVerification
Section titled “Verification”# Traefik pods running on different nodeskubectl get pods -n traefik -o wide
# All middlewares loadedkubectl get middlewares.traefik.io -n traefik
# TLS option activekubectl get tlsoptions.traefik.io -A
# IngressRoutes across all namespaceskubectl get ingressroutes.traefik.io -A
# KEDA ScaledObject activekubectl get scaledobject -n traefik
# Test bot detection (should return 403)curl -s -o /dev/null -w "%{http_code}" -H "User-Agent: sqlmap/1.0" https://httpbun-k3s.example.com/
# Test honeypot path (should return 403)curl -s -o /dev/null -w "%{http_code}" https://httpbun-k3s.example.com/.env
# Test rule engine - .git block (should return 403 with X-Rule-Match: r2)curl -s -D - https://httpbun-k3s.example.com/.git/config 2>&1 | grep -i "x-blocked-by\|x-rule-match\|http/"
# Check IPsum blocklist loadedkubectl logs -n traefik -l app.kubernetes.io/name=traefik --tail=50 | grep -i "blocklist\|ipsum"Part 12: Sentinel Operations
Section titled “Part 12: Sentinel Operations”Sentinel is the sole inline security layer. All blocking, scoring, and rule evaluation happens here.
Security Dashboard (web UI)
Section titled “Security Dashboard (web UI)”A dedicated Go+htmx web application at security-k3s.example.com provides a browser-based interface for managing Sentinel configuration and viewing security analytics. Protected by Authentik forward-auth.
| Feature | Dashboard | Direct config |
|---|---|---|
| View aggregate stats (requests, blocks, errors) | Yes (Prometheus, instant) | N/A |
| View recent blocks with details | Yes (background Loki worker) | kubectl logs |
| View bot score distribution | Yes (chart) | N/A |
| Manage firewall rules (CRUD, reorder) | Yes (modal editor, drag-to-reorder) | Edit middleware YAML |
| Manage detection rules (honeypots, scanner UAs) | Yes | Edit middleware YAML |
| Manage allowlist (add/remove IPs) | Yes | Edit middleware YAML |
| Check IP against blocklist | Yes | N/A |
| Manage rate limits (CRUD) | Yes (inline edit, create/delete modals) | kubectl patch/create/delete |
| Trigger blocklist reload | Yes | Restart Traefik |
| IP lookup (all access logs for an IP) | Yes (Loki query) | LogQL |
Source: services/security-dashboard/. Deploy: docker build --platform linux/arm64 -t erfianugrah/security-dashboard:latest . → docker push → kubectl rollout restart deployment/security-dashboard -n security-dashboard. See security-stack.md for full architecture details.
Managing firewall rules
Section titled “Managing firewall rules”Rules can be managed via the Security Dashboard’s Policy Engine page or by editing the middleware CRD directly:
# View current ruleskubectl get middleware sentinel -n traefik -o jsonpath='{.spec.plugin.sentinel.rules}' | python3 -m json.tool
# Edit rules directly (careful -- JSON in YAML)kubectl edit middleware sentinel -n traefikThe Dashboard’s Policy Engine page is preferred — it provides a modal editor with field reference, expression validation, drag-to-reorder priority, and a Deploy button that applies changes atomically.
Debugging 403 errors
Section titled “Debugging 403 errors”If services behind Traefik return 403 Forbidden unexpectedly, check in this order:
- Sentinel rule engine — check response headers
X-Blocked-ByandX-Rule-Matchto identify which rule blocked the request - Sentinel bot scoring — check
X-Bot-Scoreheader. Score >= 100 triggers a block. Review heuristic signals - IPsum blocklist — is the client IP in the blocklist? Check via the Security Dashboard’s Blocklist page
- Cloudflare WAF — check the Cloudflare dashboard for firewall events (these happen before traffic reaches Traefik)
The global middleware chain on the websecure entrypoint is: sentinel -> security-headers. The X-Blocked-By header distinguishes block sources:
X-Blocked-By value | Source | Fix |
|---|---|---|
sentinel-rule | Rule engine matched (check X-Rule-Match for rule ID) | Edit/disable the rule |
sentinel-blocklist | IP in IPsum blocklist | Add IP to allowedIPs or add an allow rule |
sentinel-heuristic | Bot score exceeded threshold | Add IP to allowedIPs or add an allow rule |
sentinel-rate | Per-IP rate limit exceeded | Increase rateLimitPerSecond or add an allow rule |
Common false positive: Cloudflare Logpush. Logpush sends requests without browser headers, triggering bot heuristics. Fix: add a rule matching the X-Logpush-Secret header with allow action (already deployed as rule r6).
IPsum blocklist management
Section titled “IPsum blocklist management”The blocklist is refreshed daily by a CronJob. To force a reload:
# Trigger manual CronJob runkubectl create job --from=cronjob/ipsum-update ipsum-manual -n sentinel
# Or restart Traefik (blocklist reloads on startup)kubectl rollout restart deployment/traefik -n traefikThe blocklist ConfigMap is in the traefik namespace. Sentinel reloads it in-memory every blocklistReloadSeconds (default 300s) without requiring a Traefik restart.
GeoIP country resolution
Section titled “GeoIP country resolution”Country data is resolved on every request and set as X-Geo-Country. When Cloudflare is in the path, Cf-Ipcountry is used directly. For non-CF traffic (e.g., direct tunnel access or if CF is removed), the GeoIP MMDB lookup provides country data.
To verify GeoIP is working:
# Check Traefik logs for GeoIP database loadkubectl logs -n traefik -l app.kubernetes.io/name=traefik | grep "GeoIP"# Expected: [sentinel] GeoIP database loaded: 1189588 nodes, IPv6
# Verify X-Geo-Country header in access logskubectl logs -n traefik -l app.kubernetes.io/name=traefik --tail=5 | grep -o '"request_X-Geo-Country":"[^"]*"'The DB-IP database is refreshed monthly. To force a re-download, restart the Traefik deployment (the init container runs on each pod start).
File Reference
Section titled “File Reference”services/ traefik.yaml # SA, ClusterRole, Service, Deployment, IngressClass, PDB
crds/ kubernetes-crd-definition-v1.yml # Traefik CRDs (~3.5 MB) kubernetes-crd-rbac.yml # ClusterRole for CRD provider
middleware/ # Local plugins (source + ConfigMap + Middleware CRD) sentinel.go # ~1843-line Go source (IP + bot + blocklist + rule engine) sentinel-middleware.yaml # Middleware CRD with config (thresholds, rules, blocklist, allowlist)
decompress-plugin/ decompress.go # 71-line Go source go.mod / .traefik.yml decompress-configmap.yaml # ConfigMap packaging for k8s decompress-middleware.yaml # Middleware CRD (in monitoring ns)
# Global middlewares security-headers.yaml # HSTS, nosniff, permissions policy tls-options.yaml # TLSOption (min TLS 1.2, AEAD ciphers, sniStrict)
# Shared per-route middlewares rate-limits.yaml # 22 per-route rate limit middlewares (rl-*) inflight-req.yaml # 100 concurrent req/IP retry.yaml # 3 attempts, 100ms backoff
# Auth authentik-forward-auth.yaml # Forward auth to Authentik (in authentik ns)
services/sentinel/ ipsum-cronjob.yaml # SA, Role, RoleBinding, Python script ConfigMap, CronJob
ingressroutes/ alertmanager-ingress.yaml # monitoring alloy-logpush-ingress.yaml # monitoring (+ decompress middleware) argocd-ingress.yaml # argocd (2 routes: HTTP + gRPC) authentik-ingress.yaml # authentik dendrite-ingress.yaml # dendrite grafana-ingress.yaml # monitoring httpbun-ingress.yaml # httpbun jaeger-ingress.yaml # monitoring (+ authentik-forward-auth) longhorn-ingress.yaml # longhorn-system portainer-agent-ingress.yaml # portainer portainer-ingress.yaml # portainer prometheus-ingress.yaml # monitoring revista-ingress.yaml # revista traefik-dashboard-ingress.yaml # traefik (api@internal) traefik-prometheus-ingress.yaml # traefik (prometheus@internal)
services/*/ingress.yaml # Service-embedded IngressRoutes security-dashboard/manifests.yaml # SA, RBAC, Secret, Deployment, Service, IngressRoute headlamp/ingressroute.yaml jitsi/ingress.yaml # 2 routes: Referer-gated + direct livekit/ingress.yaml # 2 routes + stripPrefix middlewares matrix/ingress.yaml # 3 routes: Element, Synapse Admin, Synapse maubot/ingress.yaml
tests/ sentinel-e2e.sh # 15-test E2E suite (scanner UA, honeypots, rules, headers)
services/security-dashboard/ main.go # Go+htmx dashboard (~2700+ lines, zero deps) manifests.yaml # SOPS-encrypted (ns, SA, RBAC, secret, deploy, svc, ingress) Dockerfile # Multi-stage ARM64 build ui/ # Templates + static assets (go:embed)
monitoring/ alloy/configmap.yaml # 19 structured metadata fields, 7 Prometheus counters loki/configmap.yaml # gRPC 16MB max, split_queries 1h grafana/dashboards/ traefik-access-logs.json # 37 panels, Prometheus + Loki, Sentinel security section
hpa/ traefik-keda-autoscaling.yaml # 5-trigger ScaledObject (1-8 replicas)
pvc-claims/ traefik-ssl-pvc.yaml # 2Gi NFS PVC for ACME cert storage
cloudflare-tunnel-tf/ tunnel_config.tf # Tunnel ingress rules (hostname → Traefik) records.tf # DNS CNAME records → tunnel
ansible-playbooks/my-playbooks/ disable-builtin-traefik.yml # Disables k3s built-in Traefik + ServiceLB