Skip to content

ARM64 k3s Cluster Operations: Turing Pi, Node Management, and Recovery

An operations guide for running a 4-node ARM64 k3s cluster on Turing RK1 compute modules in a Turing Pi 2 board. This covers the non-obvious parts of keeping a bare-metal homelab cluster healthy: boot configuration quirks on Rockchip BSP kernels, the cgroup v1 to v2 migration, k3s’s dynamic TLS certificate system and how it breaks agents after server reboots, kube-proxy iptables recovery after power loss, disk pressure from uncontrolled logs, and recovering bricked nodes through the BMC serial console.

Everything here was learned the hard way through actual cluster failures. Each section includes the root cause analysis, the fix, and the gotchas encountered along the way.


NodeIPk3s RoleTuring Pi SlotNotes
node110.0.0.9control-plane (server)1API server, etcd, scheduler
node210.0.0.10agent2
node310.0.0.11agent3NFS server (~460 GB at /data)
node410.0.0.12agent4

All nodes: Turing RK1 (Rockchip RK3588, 8 GB RAM, 29 GB SD card), Ubuntu 22.04, kernel 5.10.160-rockchip (BSP), k3s v1.34.3+k3s3, containerd v2.1.5.

d2 diagram
ComponentVersion
k3sv1.34.3+k3s3
Containerdv2.1.5-k3s1
Kernel5.10.160-rockchip (BSP)
Ubuntu22.04 LTS (Jammy)
U-BootRockchip (vendor)
Turing Pi BMC(accessible at BMC IP on the LAN)
tpi CLILatest from Turing Pi
inventory.yml
k3s_cluster:
children:
server:
hosts:
10.0.0.9:
agent:
hosts:
10.0.0.10:
10.0.0.11:
10.0.0.12:
vars:
ansible_port: 22
ansible_user: your_user
ansible_python_interpreter: /usr/bin/python3
k3s_version: v1.34.3+k3s3
extra_server_args: --disable traefik --disable servicelb

The server has --disable traefik --disable servicelb because both are replaced with custom deployments (see the Traefik and monitoring guides).


The Turing RK1 modules run a Rockchip BSP U-Boot that loads its configuration from /boot/firmware/. The key files:

FilePurpose
/boot/firmware/ubuntuEnv.txtKernel cmdline (bootargs=), DTB file, overlays
/boot/firmware/boot.cmdU-Boot script source (human-readable)
/boot/firmware/boot.scrCompiled U-Boot script (binary, loaded by U-Boot)

To change kernel boot parameters, edit ubuntuEnv.txt. You do not need to recompile boot.scr — U-Boot reads ubuntuEnv.txt at boot and substitutes the variables into the boot script.

bootargs=root=UUID=<uuid> rootfstype=ext4 rootwait rw console=ttyS9,115200 console=ttyS2,1500000 console=tty1 systemd.unified_cgroup_hierarchy=1
fdtfile=rk3588-turing-rk1.dtb
overlay_prefix=rk3588
overlays=

The RK1 modules ship with cgroup v1 enabled explicitly in the kernel cmdline. Kubernetes and containerd have deprecated cgroup v1 support and the kernel 5.10 Rockchip BSP supports cgroup v2 with the systemd unified hierarchy.

The original bootargs contain these cgroup v1 flags:

cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory swapaccount=1 systemd.unified_cgroup_hierarchy=0

These need to be replaced with:

systemd.unified_cgroup_hierarchy=1

Swap also needs to be disabled — k3s on cgroup v2 with swap enabled produces tmpfs-noswap warnings and kubelet considers memory-backed volumes (secrets, emptyDirs) insecure because they could be swapped to disk.

The playbook processes agents first (one at a time, rolling), then the server last. Each node is drained, rebooted, verified, and uncordoned before moving to the next.

cgroup-v2-swap-off.yml
- name: Disable swap and enable cgroup v2 on agent nodes
hosts: agent
become: yes
serial: 1
tasks:
- name: Disable swap immediately
ansible.builtin.command: swapoff -a
changed_when: true
- name: Remove swap entry from fstab
ansible.builtin.lineinfile:
path: /etc/fstab
regexp: '^\s*/swapfile\s'
state: absent
- name: Delete swapfile
ansible.builtin.file:
path: /swapfile
state: absent
- name: Update bootargs — remove cgroup v1 flags and enable cgroup v2
ansible.builtin.replace:
path: /boot/firmware/ubuntuEnv.txt
# NOTE: Use literal ' *' at end, NOT '\s*' — see Boot Configuration section
regexp: 'cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory swapaccount=1 systemd\.unified_cgroup_hierarchy=0 *'
replace: "systemd.unified_cgroup_hierarchy=1 "
- name: Drain node before reboot
delegate_to: localhost
become: no
ansible.builtin.command: >
kubectl drain {{ ansible_hostname }}
--ignore-daemonsets --delete-emptydir-data --timeout=120s --force
- name: Reboot into cgroup v2
ansible.builtin.reboot:
reboot_timeout: 300
msg: "Rebooting for cgroup v2 + swap off"
pre_reboot_delay: 5
post_reboot_delay: 30
- name: Restart k3s-agent to refresh certificates
ansible.builtin.systemd:
name: k3s-agent
state: restarted
- name: Wait for k3s-agent to be active
ansible.builtin.shell: systemctl is-active k3s-agent
register: k3s_status
until: k3s_status.rc == 0
retries: 18
delay: 10
changed_when: false
- name: Uncordon node
delegate_to: localhost
become: no
ansible.builtin.command: kubectl uncordon {{ ansible_hostname }}
- name: Wait for node to be Ready
delegate_to: localhost
become: no
ansible.builtin.shell: >
kubectl get node {{ ansible_hostname }}
-o jsonpath='{.status.conditions[?(@.type=="Ready")].status}'
register: node_ready
until: node_ready.stdout == "True"
retries: 30
delay: 10
changed_when: false
- name: Verify cgroup v2 is active
ansible.builtin.shell: stat -f --format=%T /sys/fs/cgroup
register: cgroupfs
changed_when: false
failed_when: cgroupfs.stdout != "cgroup2fs"
- name: Verify swap is off
ansible.builtin.command: swapon --show
register: swap_check
changed_when: false
failed_when: swap_check.stdout | length > 0
- name: Disable swap and enable cgroup v2 on server node
hosts: server
become: yes
tasks:
# Same tasks as above, but:
# - service name is 'k3s' not 'k3s-agent'
# - post_reboot_delay is 60 (server takes longer — etcd bootstrap)
# - server is done last to avoid cascading agent failures

After the server reboots, its dynamic listener TLS cert may change. Agents that booted before or concurrently with the server will cache the old CA and fail with x509: certificate signed by unknown authority. The playbook includes a Restart k3s-agent to refresh certificates step to handle this, but it’s not always sufficient. See Part 3 for the full explanation and fix.

Terminal window
# On each node:
stat -f --format=%T /sys/fs/cgroup
# Expected: cgroup2fs
swapon --show
# Expected: (empty output)
free -h | grep Swap
# Expected: Swap: 0B 0B 0B

After migration, containerd v2.1 on k3s 1.34 produces an InvalidDiskCapacity warning on every kubelet start:

Warning InvalidDiskCapacity kubelet invalid capacity 0 on image filesystem

This is caused by disable_snapshot_annotations = true in the auto-generated containerd config. It’s cosmetic — disk capacity reporting works fine. It will resolve with a future k3s upgrade.


This is the single most disruptive issue on this cluster. After a server reboot (or power cycle), agents frequently get stuck with:

tls: failed to verify certificate: x509: certificate signed by unknown authority

k3s has two separate CA systems and a dynamic TLS certificate:

ComponentPath (server)Purpose
Server CA/var/lib/rancher/k3s/server/tls/server-ca.crtSigns the API server’s serving certificate
Client CA/var/lib/rancher/k3s/server/tls/client-ca.crtSigns kubelet/kube-proxy client certs
Dynamic listener cert/var/lib/rancher/k3s/server/tls/dynamic-cert.jsonThe actual TLS cert presented on port 6443, managed by rancher/dynamiclistener

Agents cache the server CA and client CA locally:

ComponentPath (agent)
Cached server CA/var/lib/rancher/k3s/agent/server-ca.crt
Cached client CA/var/lib/rancher/k3s/agent/client-ca.crt

The dynamic listener cert is stored in three places: a Kubernetes Secret (k3s-serving in kube-system), a local file cache, and in memory. On server startup, before etcd is available, the dynamic listener uses the file cache. If this cert doesn’t match what agents expect (because the cert was regenerated, or the agent has a stale CA from a previous boot), agents can’t verify the server and enter a TLS error loop.

The critical point: systemctl restart k3s-agent does not fix this. The agent reloads the same stale CA certs from disk. You must delete the cached CA files so the agent re-fetches them from the server’s /cacerts endpoint.

This procedure fixes the common case where the server CA has NOT changed but agents have stale cached copies (e.g. after a server reboot or power cycle).

Step 1: Verify the server is up and the API is working:

Terminal window
# From the server node itself:
kubectl get nodes --kubeconfig /etc/rancher/k3s/k3s.yaml
# Or via ansible:
ansible server -i inventory.yml -m shell \
-a "kubectl get nodes --kubeconfig /etc/rancher/k3s/k3s.yaml" \
-b -e "ansible_become_pass=<pass>"

Step 2: On each broken agent, stop the agent, delete cached certs, start:

Terminal window
# Via ansible (one node at a time):
ansible <agent_ip> -i inventory.yml -m shell \
-a "systemctl stop k3s-agent; \
sleep 2; \
rm -f /var/lib/rancher/k3s/agent/server-ca.crt \
/var/lib/rancher/k3s/agent/client-ca.crt; \
nohup systemctl start k3s-agent &" \
-b -e "ansible_become_pass=<pass>"

Step 3: Wait 60-90 seconds, then verify:

Terminal window
kubectl get nodes

If an agent is stuck and you’re not sure whether it’s a cert issue, compare the CA fingerprints:

Terminal window
# On the server — the authoritative CA:
openssl x509 -in /var/lib/rancher/k3s/server/tls/server-ca.crt \
-noout -fingerprint -sha256
# On the agent — what it has cached:
openssl x509 -in /var/lib/rancher/k3s/agent/server-ca.crt \
-noout -fingerprint -sha256
# What the server's /cacerts endpoint returns:
curl -sk https://<server_ip>:6443/cacerts | \
openssl x509 -noout -fingerprint -sha256

If the agent’s fingerprint doesn’t match the server’s, the cert is stale.

ApproachDifficultyEffect
Custom CA certs (pre-created before first server start)High (requires reinstall)CA never changes across reboots
--tls-san on server configLowReduces dynamic cert regeneration
Health watchdog with TLS detectionMediumAuto-recovers agents (see Part 4)
Boot order: server first, agents afterLowAvoids race condition

A systemd timer that runs every 5 minutes on each node and detects two failure modes:

  1. Missing kube-proxy iptables rules (common after power loss) — attempts a non-disruptive canary chain resync before falling back to a service restart
  2. Any API failure (stale certs, unauthorized, connection refused) — always clears cached CA certs before restarting

Key safety features:

  • 10-minute cooldown: skips if the agent was restarted recently, preventing a death loop
  • Server reachability gate: won’t restart the agent if the server API is unreachable (would just get bad certs)
#!/bin/bash
# k3s agent health watchdog
SERVICE=k3s-agent
KUBECONFIG=/var/lib/rancher/k3s/agent/kubelet.kubeconfig
NODENAME=$(hostname)
SERVER_URL="https://10.0.0.1:6443" # Replace with your server IP
COOLDOWN_SECONDS=600 # 10 minutes
CERT_DIR=/var/lib/rancher/k3s/agent
# ---- Cooldown check ----
# If the agent was (re)started recently, don't interfere — give it
# time to stabilize.
ACTIVE_ENTER=$(systemctl show "$SERVICE" \
--property=ActiveEnterTimestamp --value 2>/dev/null)
if [ -n "$ACTIVE_ENTER" ]; then
ENTER_EPOCH=$(date -d "$ACTIVE_ENTER" +%s 2>/dev/null || echo 0)
NOW_EPOCH=$(date +%s)
AGE=$(( NOW_EPOCH - ENTER_EPOCH ))
if [ "$AGE" -lt "$COOLDOWN_SECONDS" ]; then
logger -t k3s-watchdog \
"Agent started ${AGE}s ago (cooldown ${COOLDOWN_SECONDS}s) — skipping"
exit 0
fi
fi
if ! systemctl is-active --quiet "$SERVICE"; then
logger -t k3s-watchdog "Agent is not active — skipping"
exit 0
fi
# ---- Check 1: kube-proxy iptables rules ----
if [ "$(iptables-save 2>/dev/null | grep -c KUBE-SVC)" -eq 0 ]; then
logger -t k3s-watchdog "kube-proxy iptables rules missing — deleting canary chains"
iptables -t mangle -F KUBE-PROXY-CANARY 2>/dev/null
iptables -t mangle -X KUBE-PROXY-CANARY 2>/dev/null
iptables -t nat -F KUBE-PROXY-CANARY 2>/dev/null
iptables -t nat -X KUBE-PROXY-CANARY 2>/dev/null
iptables -t filter -F KUBE-PROXY-CANARY 2>/dev/null
iptables -t filter -X KUBE-PROXY-CANARY 2>/dev/null
sleep 40
if [ "$(iptables-save 2>/dev/null | grep -c KUBE-SVC)" -eq 0 ]; then
logger -t k3s-watchdog "Canary resync failed — will restart in API check"
else
logger -t k3s-watchdog "kube-proxy iptables rules restored via canary resync"
fi
fi
# ---- Check 2: API reachability ----
# Use "get node $NODENAME" not "get nodes" — the kubelet kubeconfig
# (system:node:<name>) only has RBAC to read its own node object.
API_ERR=$(kubectl --kubeconfig="$KUBECONFIG" get node "$NODENAME" 2>&1)
API_RC=$?
if [ $API_RC -eq 0 ]; then
exit 0
fi
logger -t k3s-watchdog "API check failed for $NODENAME (rc=$API_RC): $API_ERR"
# ---- Server reachability gate ----
if ! curl -sk --max-time 5 "$SERVER_URL/cacerts" >/dev/null 2>&1; then
logger -t k3s-watchdog "Server API not reachable — skipping restart"
exit 0
fi
# ---- Restart with cert cleanup ----
# Always clear cached CA certs. Stale certs manifest as various errors:
# - x509: certificate signed by unknown authority
# - connection refused (local proxy can't auth to server)
# - 401 Unauthorized (server rejects stale client cert)
logger -t k3s-watchdog "Clearing cached CA certs and restarting $SERVICE"
systemctl stop "$SERVICE"
sleep 2
rm -f "$CERT_DIR/client-ca.crt" "$CERT_DIR/server-ca.crt"
systemctl start "$SERVICE"
logger -t k3s-watchdog "Agent restarted with fresh certs"

The server version only checks kube-proxy iptables. It intentionally does not check API auth or restart the k3s server, because restarting the server would invalidate all agent tokens and cause a cascading failure.

#!/bin/bash
# k3s server health watchdog
SERVICE=k3s
if ! systemctl is-active --quiet "$SERVICE"; then
exit 0
fi
if [ "$(iptables-save 2>/dev/null | grep -c KUBE-SVC)" -eq 0 ]; then
logger -t k3s-watchdog "kube-proxy iptables rules missing on server"
iptables -t mangle -F KUBE-PROXY-CANARY 2>/dev/null
iptables -t mangle -X KUBE-PROXY-CANARY 2>/dev/null
iptables -t nat -F KUBE-PROXY-CANARY 2>/dev/null
iptables -t nat -X KUBE-PROXY-CANARY 2>/dev/null
iptables -t filter -F KUBE-PROXY-CANARY 2>/dev/null
iptables -t filter -X KUBE-PROXY-CANARY 2>/dev/null
sleep 40
if [ "$(iptables-save 2>/dev/null | grep -c KUBE-SVC)" -eq 0 ]; then
logger -t k3s-watchdog "Canary resync failed — restarting $SERVICE as last resort"
systemctl restart "$SERVICE"
fi
fi

The watchdog is deployed via Ansible as an inline copy task — the script is embedded directly in the playbook rather than maintained as a separate file:

k3s-agent-health.yml
- name: Deploy k3s health watchdog on agent nodes
hosts: agent
become: yes
tasks:
- name: Install k3s health check script
ansible.builtin.copy:
dest: /usr/local/bin/k3s-health-check
mode: "0755"
content: |
#!/bin/bash
# (full script from above)
- name: Install systemd timer for health check
ansible.builtin.copy:
dest: /etc/systemd/system/k3s-health.timer
content: |
[Unit]
Description=k3s health watchdog timer
[Timer]
OnBootSec=3min
OnUnitActiveSec=5min
AccuracySec=30s
[Install]
WantedBy=timers.target
- name: Install systemd service for health check
ansible.builtin.copy:
dest: /etc/systemd/system/k3s-health.service
content: |
[Unit]
Description=k3s health watchdog
After=k3s-agent.service
[Service]
Type=oneshot
ExecStart=/usr/local/bin/k3s-health-check
- name: Reload systemd and enable timer
ansible.builtin.systemd:
daemon_reload: yes
- name: Enable and start the health check timer
ansible.builtin.systemd:
name: k3s-health.timer
enabled: yes
state: started
Terminal window
# View recent watchdog actions:
journalctl -t k3s-watchdog --no-pager --since '1 hour ago'
# Check timer status:
systemctl list-timers k3s-health.timer

With 29 GB SD cards, disk space is a constant concern. The main consumers are journal logs, containerd rotated logs, and monitoring data.

By default, journald can grow to 2.7-2.8 GB per node. Set a permanent limit:

/etc/systemd/journald.conf.d/size.conf
[Journal]
SystemMaxUse=256M

Deploy and apply:

Terminal window
ansible all -i inventory.yml -m copy \
-a "dest=/etc/systemd/journald.conf.d/size.conf content='[Journal]\nSystemMaxUse=256M\n'" \
-b -e "ansible_become_pass=<pass>"
ansible all -i inventory.yml -m shell \
-a "journalctl --vacuum-size=256M && systemctl restart systemd-journald" \
-b -e "ansible_become_pass=<pass>"

Containerd creates rotated logs (*.gz files) in /var/lib/rancher/k3s/agent/containerd/. These accumulate silently. Clean them periodically:

Terminal window
ansible all -i inventory.yml -m shell \
-a "find /var/lib/rancher/k3s/agent/containerd/ -name '*.gz' -delete" \
-b -e "ansible_become_pass=<pass>"

Ubuntu’s rsyslog can also accumulate large rotated logs. Configure aggressive rotation:

/etc/logrotate.d/rsyslog-aggressive
/var/log/syslog /var/log/kern.log /var/log/auth.log {
daily
rotate 3
compress
delaycompress
missingok
notifempty
maxsize 50M
}

Loki’s TSDB shipper writes index and cache data. If configured with an emptyDir volume, this writes to the node’s disk and causes disk pressure. Move it to NFS:

# In the Loki configmap, change tsdb_shipper paths:
schema_config:
configs:
- from: "2024-09-01"
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h
storage_config:
tsdb_shipper:
active_index_directory: /loki/tsdb-index # was /local/tsdb-index
cache_location: /loki/tsdb-cache # was /local/tsdb-cache
filesystem:
directory: /loki/chunks

Remove the emptyDir volume from the StatefulSet and make sure /loki/ is backed by the NFS PVC.


Swap is disabled as part of the cgroup v2 migration (see Part 2), but if you need to do it separately:

Terminal window
# Immediate:
swapoff -a
# Permanent — remove from fstab:
sed -i '/\/swapfile/d' /etc/fstab
# Reclaim disk space:
rm -f /swapfile # Frees ~2 GB per node

The RK1 modules have 8 GB RAM, which is enough for the workloads running on this cluster. With swap enabled on cgroup v2, kubelet produces tmpfs-noswap warnings because it can’t guarantee that memory-backed volumes (Kubernetes secrets, emptyDirs) stay in RAM.


When a node won’t boot (e.g. due to a corrupted ubuntuEnv.txt — see the regex gotcha in Part 1), you can recover it through the Turing Pi BMC’s serial console without physically accessing the board.

The tpi CLI tool must be installed and configured to talk to the BMC’s IP address on your LAN.

Terminal window
# Read serial output from slot 2 (rock2):
tpi uart -n 2 get
# Send a command to the serial console:
tpi uart -n 2 set -c 'ls /boot/firmware/'

Recovery scenario: corrupted ubuntuEnv.txt

Section titled “Recovery scenario: corrupted ubuntuEnv.txt”

This happened when an Ansible regex consumed a newline, merging fdtfile=rk3588-turing-rk1.dtb into the bootargs= line. U-Boot couldn’t find the DTB and the node dropped to a U-Boot shell.

Step 1: Power cycle the node and catch the U-Boot prompt:

Terminal window
tpi power -n 2 off
sleep 2
tpi power -n 2 on
# Watch serial output:
tpi uart -n 2 get

Step 2: If the node drops to a U-Boot shell, boot manually:

Terminal window
# Load the kernel, DTB, and initrd from the SD card:
tpi uart -n 2 set -c 'load mmc 1:1 ${kernel_addr_r} /boot/vmlinuz'
tpi uart -n 2 set -c 'load mmc 1:1 ${fdt_addr_r} /boot/dtbs/5.10.160-rockchip/rockchip/rk3588-turing-rk1.dtb'
tpi uart -n 2 set -c 'load mmc 1:1 ${ramdisk_addr_r} /boot/initrd.img'
tpi uart -n 2 set -c 'booti ${kernel_addr_r} ${ramdisk_addr_r} ${fdt_addr_r}'

Step 3: Once Linux boots, SSH in and fix ubuntuEnv.txt:

Terminal window
sudo vi /boot/firmware/ubuntuEnv.txt
# Ensure bootargs and fdtfile are on separate lines
sudo reboot
Terminal window
# Power cycle a single slot:
tpi power -n <slot> off && sleep 2 && tpi power -n <slot> on
# Power cycle all slots:
tpi power -n 1 off && tpi power -n 2 off && tpi power -n 3 off && tpi power -n 4 off
sleep 2
tpi power -n 1 on && tpi power -n 2 on && tpi power -n 3 on && tpi power -n 4 on

On a cluster with ~265k active series, 59 scrape targets, and a 30s scrape interval, Prometheus uses approximately 1 GB of memory at steady state.

The default memory request from many Helm charts (512Mi) is too low. Set it to match actual usage to prevent OOM kills and ensure the scheduler places the pod on a node with sufficient capacity:

# In the Prometheus CR or manifest:
resources:
requests:
cpu: 200m
memory: 1Gi
limits:
memory: 2Gi

To right-size Prometheus, query its own metrics:

# Current RSS:
process_resident_memory_bytes{job="prometheus"}
# Active series count:
prometheus_tsdb_head_series
# Scrape target count:
count(up)
# Ingestion rate:
rate(prometheus_tsdb_head_samples_appended_total[5m])

After node reboots, agent restarts, or operator reconciliation conflicts, pods can be left in Error or Completed state. These consume API server resources and clutter kubectl get pods output.

Terminal window
# Delete all Error pods cluster-wide:
kubectl get pods -A --field-selector=status.phase=Failed \
-o jsonpath='{range .items[*]}{.metadata.namespace}{" "}{.metadata.name}{"\n"}{end}' | \
while read ns name; do kubectl delete pod -n "$ns" "$name"; done
# Delete all Completed (Succeeded) pods:
kubectl get pods -A --field-selector=status.phase=Succeeded \
-o jsonpath='{range .items[*]}{.metadata.namespace}{" "}{.metadata.name}{"\n"}{end}' | \
while read ns name; do kubectl delete pod -n "$ns" "$name"; done

Or more concisely:

Terminal window
kubectl delete pods -A --field-selector=status.phase=Failed
kubectl delete pods -A --field-selector=status.phase=Succeeded

KEDA’s ScaledObjects can conflict with Kubernetes operators that manage the same workloads.

If a KEDA ScaledObject targets the prometheus-prometheus StatefulSet directly, KEDA will scale it (e.g. to 4 replicas), but the Prometheus Operator’s Prometheus CR has replicas: 1. The operator reconciles constantly, trying to scale back down, while KEDA keeps scaling up. This creates crash-looping pods with volume mount failures because the PVCs don’t exist for the extra replicas.

Fix: Delete the KEDA ScaledObject for Prometheus. Let the Prometheus Operator manage the replica count through its CR. If you need Prometheus autoscaling, do it through the Prometheus CR’s replicas field or use a VPA instead.

KEDA ScaledObjects that use Prometheus as a trigger source need the correct service address. The Prometheus Operator creates a service called prometheus-operated (headless) and you typically create a ClusterIP service with a shorter name. Make sure the serverAddress in ScaledObject triggers matches your actual service:

triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus.monitoring.svc.cluster.local:9090
# NOT: prometheus-kube-prometheus-prometheus.monitoring.svc.cluster.local:9090

When k3s first starts, it generates a node-join token stored at /var/lib/rancher/k3s/server/token:

K10<ca_hash>::server:<password>

The K10 prefix tells the agent to verify the server’s CA fingerprint against <ca_hash> during bootstrap. The <password> is the actual authentication credential.

If you reinstall or reset the k3s server (e.g. k3s-uninstall.sh then fresh install), a new CA is generated. But:

  1. The node-token file may keep the old CA hash
  2. The agent service files (created by Ansible or the installer) still have the old K10<old_hash>::server:<password> token
  3. The agent’s private keys and client certs (under /var/lib/rancher/k3s/agent/) were signed by the old CA

The result is agents that appear to connect intermittently but fail authentication. The error messages vary depending on timing:

  • x509: certificate signed by unknown authority (TLS handshake)
  • 401 Unauthorized (server rejects old client cert)
  • connection refused on 127.0.0.1:6443 (local proxy can’t authenticate to server)

Simply deleting cached CA certs (server-ca.crt, client-ca.crt) does not fix this because the agent re-bootstraps using the old token and old private keys, getting certs signed by the old CA again.

Compare the CA timestamps:

Terminal window
# Server's current CA:
openssl x509 -noout -issuer \
< /var/lib/rancher/k3s/server/tls/server-ca.crt
# e.g. issuer=CN = k3s-server-ca@1755079458
# Agent's cached CA:
openssl x509 -noout -issuer \
< /var/lib/rancher/k3s/agent/server-ca.crt
# e.g. issuer=CN = k3s-server-ca@1709313234 <-- MISMATCH
# Agent's client cert (who signed it?):
openssl x509 -noout -issuer \
< /var/lib/rancher/k3s/agent/client-kubelet.crt
# Should match the server's client-ca.crt

If the @timestamp values differ, the CA was rotated and agents need a full reinstall.

Fix: strip the CA pin and reinstall agents

Section titled “Fix: strip the CA pin and reinstall agents”

Step 1: Update the token in your Ansible inventory to use only the password (no K10 prefix):

inventory.yml
vars:
# Do NOT use the K10<hash> form. It pins to a specific CA and
# breaks when the server CA rotates.
token: "<your_server_password>"

You can find the password portion from the server’s token file:

Terminal window
# On the server:
cat /var/lib/rancher/k3s/server/token
# K10<hash>::server:<password> <-- use just the <password> part

Step 2: Uninstall and reinstall each agent using Ansible (one at a time):

Terminal window
# Uninstall:
ansible-playbook -i inventory.yml reset.yml \
--limit <agent_ip> -e "ansible_become_pass=<pass>"
# Reinstall:
ansible-playbook -i inventory.yml site.yml \
--limit <agent_ip> -e "ansible_become_pass=<pass>"

Step 3: Verify the agent has the correct CA:

Terminal window
ansible <agent_ip> -i inventory.yml -m shell \
-a "openssl x509 -noout -issuer \
< /var/lib/rancher/k3s/agent/server-ca.crt" \
-b -e "ansible_become_pass=<pass>"

SymptomLikely causeFix
Agent: x509: certificate signed by unknown authorityStale cached CA after server rebootStop agent, rm CA certs, start agent (Part 3)
Agent: x509 persists after clearing certsCA was rotated (server reinstall); token has old CA hashFull agent reinstall with stripped token (Part 11)
Agent: Unauthorized on API callsStale client certs signed by old CAClear certs and restart; if it persists, full reinstall (Part 11)
Watchdog creating a restart death loopNo cooldown + narrow TLS detectionDeploy updated watchdog with cooldown and server gate (Part 4)
Missing kube-proxy iptables rulesPower loss wiped in-memory iptablesDelete KUBE-PROXY-CANARY chains, or restart agent
InvalidDiskCapacity warningContainerd v2.1 + k3s 1.34 cosmetic bugIgnore (resolves with k3s upgrade)
tmpfs-noswap warningSwap is enabled on cgroup v2swapoff -a, remove from fstab
CgroupV1 warning on kubelet startNode still on cgroup v1Run the cgroup v2 migration (Part 2)
Node won’t boot after config changeCorrupted ubuntuEnv.txtBMC serial console recovery (Part 7)
Disk pressure on agent nodesJournal/containerd logs filling SD cardVacuum journals, clean rotated logs (Part 5)
CrowdSec banning your IP for http-probingDashboard UI 404s exceeding scenario thresholdParser-level whitelist (see Traefik guide)
Need to check/manage CrowdSec without cscliN/AUse the Security Dashboard at security-k3s.example.com (see Traefik guide)
Prometheus pods crash-loopingKEDA vs Prometheus Operator replica conflictDelete the KEDA ScaledObject for Prometheus
ansible-playbooks/my-playbooks/
├── inventory.yml # Cluster inventory (server + 3 agents)
├── k3s-agent-health.yml # Health watchdog deployment
├── cgroup-v2-swap-off.yml # Cgroup v2 migration + swap removal
└── site.yml # Wrapper that imports playbooks
/boot/firmware/ # Per-node boot config
├── ubuntuEnv.txt # Kernel cmdline, DTB, overlays
├── boot.cmd # U-Boot script source
└── boot.scr # Compiled U-Boot script
/var/lib/rancher/k3s/
├── server/tls/ # Server-side TLS (only on rock1)
│ ├── server-ca.crt # Server CA — signs serving certs
│ ├── client-ca.crt # Client CA — signs kubelet certs
│ └── dynamic-cert.json # Dynamic listener cert (file cache)
└── agent/ # Agent-side (rock2-rock4)
├── server-ca.crt # Cached server CA (delete to refresh)
├── client-ca.crt # Cached client CA (delete to refresh)
└── kubelet.kubeconfig # Kubelet credentials
/usr/local/bin/k3s-health-check # Watchdog script (deployed by Ansible)
/etc/systemd/system/
├── k3s-health.timer # 5-minute watchdog timer
└── k3s-health.service # Oneshot service for watchdog