Preparation Steps

Before starting the upgrade, create an etcd backup:

talosctl -n 10.122.0.10 etcd snapshot etcd.backup

Cluster Configuration

My starting point is a Talos cluster with one control plane and four worker nodes:

  • Control Plane: control-01 10.121.0.10
  • Workers:
    • node01 10.121.0.11
    • node02 10.121.0.12
    • node03 10.121.0.13
    • node04 10.121.0.14

Initial Environment

  • Starting Talos version: 1.6.1
  • Target Talos version: 1.9.2
  • Initial Kubernetes version: 1.29.0
  • Target Kubernetes version: 1.31.5
  • Hardware: Bare-metal Machine
  • Architecture: amd64

Factory Image Creation

Visit https://factory.talos.dev/ and select:

  • Hardware Type: Bare-metal Machine
  • Choose Talos Linux Version:
      "1.6.8"
      "1.7.7"
      "1.8.4"
      "1.9.2"
    
  • Machine Architecture: amd64
  • System Extensions:
    • siderolabs/i915 (20241210-v1.9.2)
    • siderolabs/intel-ucode (20241112)
    • siderolabs/iscsi-tools (v0.1.6)
  • Copy the image to upgrade Talos Linux on the machine:
    factory.talos.dev/installer/376567988ad370138ad8b2698212367b8edcb69b5fd68c80be1f2ec7d603b4ba:v1.9.2
    

Environment Setup

# Define node IPs
export CONTROL_NODE="10.121.0.10"
export WORKER1="10.121.0.11"
export WORKER2="10.121.0.12"
export WORKER3="10.121.0.13"
export WORKER4="10.121.0.14"

Talos Upgrade Process

Since direct upgrades from 1.6.1 to 1.9.2 are not supported, we needed to perform incremental upgrades using images created at factory.talos.dev. The upgrade path followed was:

1.6.1 → 1.6.8 → 1.7.7 → 1.8.4 → 1.9.2

Control plane node upgrades:

talosctl -n ${CONTROL_NODE} upgrade --image factory.talos.dev/installer/2dcd442954d67662d41c61bdb92165aaf7189aff9997bd011b6968c12ce8d9c0:v1.6.8
talosctl -n ${CONTROL_NODE} upgrade --image factory.talos.dev/installer/2dcd442954d67662d41c61bdb92165aaf7189aff9997bd011b6968c12ce8d9c0:v1.7.7
talosctl -n ${CONTROL_NODE} upgrade --image factory.talos.dev/installer/2dcd442954d67662d41c61bdb92165aaf7189aff9997bd011b6968c12ce8d9c0:v1.8.4
talosctl -n ${CONTROL_NODE} upgrade --image factory.talos.dev/installer/9ffda6da42b0e45bb9f486a3579c3c672c6971c1acaba0cc8ed8e9a0a5bb9e09:v1.9.2

For each worker node upgrades:

talosctl -n ${WORKER1} upgrade --image factory.talos.dev/installer/2dcd442954d67662d41c61bdb92165aaf7189aff9997bd011b6968c12ce8d9c0:v1.6.8
talosctl -n ${WORKER1} upgrade --image factory.talos.dev/installer/2dcd442954d67662d41c61bdb92165aaf7189aff9997bd011b6968c12ce8d9c0:v1.7.7
talosctl -n ${WORKER1} upgrade --image factory.talos.dev/installer/2dcd442954d67662d41c61bdb92165aaf7189aff9997bd011b6968c12ce8d9c0:v1.8.4
talosctl -n ${WORKER1} upgrade --image factory.talos.dev/installer/9ffda6da42b0e45bb9f486a3579c3c672c6971c1acaba0cc8ed8e9a0a5bb9e09:v1.9.2

Kubernetes Upgrade Process

After completing the Talos upgrades, we proceeded with upgrading Kubernetes through the following versions:

1.29.0 → 1.29.13 → 1.30.9 → 1.31.5

talosctl -n ${CONTROL_NODE} upgrade-k8s --to 1.29.13
talosctl -n ${CONTROL_NODE} upgrade-k8s --to 1.30.9
talosctl -n ${CONTROL_NODE} upgrade-k8s --to 1.31.5

Troubleshooting

If you encounter the following error during the Kubernetes upgrade:

automatically detected the lowest Kubernetes version 1.29.9
unsupported upgrade path 1.29->1.30 (from "1.29.9" to "1.30.9")

This typically indicates that your talosctl client needs to be updated. If issues persist, you can perform a manual upgrade using the following commands:

Control plane node:

talosctl -n ${CONTROL_NODE} patch mc --mode=no-reboot -p '[{"op": "replace", "path": "/cluster/apiServer/image", "value": "registry.k8s.io/kube-apiserver:v'${K8S_INTERIM2}'"}]'
talosctl -n ${CONTROL_NODE} patch mc --mode=no-reboot -p '[{"op": "replace", "path": "/cluster/controllerManager/image", "value": "registry.k8s.io/kube-controller-manager:v'${K8S_INTERIM2}'"}]'
talosctl -n ${CONTROL_NODE} patch mc --mode=no-reboot -p '[{"op": "replace", "path": "/cluster/scheduler/image", "value": "registry.k8s.io/kube-scheduler:v'${K8S_INTERIM2}'"}]'
talosctl -n ${CONTROL_NODE} patch mc --mode=no-reboot -p '[{"op": "replace", "path": "/machine/kubelet/image", "value": "ghcr.io/siderolabs/kubelet:v'${K8S_INTERIM2}'"}]'

Worker nodes:

talosctl -n ${WORKER1} patch mc --mode=no-reboot -p '[{"op": "replace", "path": "/machine/kubelet/image", "value": "ghcr.io/siderolabs/kubelet:v'${K8S_INTERIM2}'"}]'
talosctl -n ${WORKER2} patch mc --mode=no-reboot -p '[{"op": "replace", "path": "/machine/kubelet/image", "value": "ghcr.io/siderolabs/kubelet:v'${K8S_INTERIM2}'"}]'
talosctl -n ${WORKER3} patch mc --mode=no-reboot -p '[{"op": "replace", "path": "/machine/kubelet/image", "value": "ghcr.io/siderolabs/kubelet:v'${K8S_INTERIM2}'"}]'
talosctl -n ${WORKER4} patch mc --mode=no-reboot -p '[{"op": "replace", "path": "/machine/kubelet/image", "value": "ghcr.io/siderolabs/kubelet:v'${K8S_INTERIM2}'"}]'

Conclusion

The upgrade process was successful, resulting in a fully updated cluster running Talos 1.9.2 and Kubernetes 1.31.5. Remember to always update your talosctl client to match the cluster version to avoid compatibility issues during upgrades.