Kubernetes with K3S: How I Upgraded a Production Cluster from v1.17 to v1.25
Since its inception in 2020, my Kubernetes stack happily serves this blog and my lighthouse service. While I updated the application code base, I did stay with the Kubernetes version installed at that date and time: v1.17. It’s time to change that, and upgrade stepwise to a most recent version. The upgrade seemed to be challengingly, and so I made some notes which ultimately led to this blog post.
This blog post is a concise summary of upgrading a rancher K3 cluster from v1.17 to 1.24. Read it to get a thorough, practical explanation of potential problems and their solutions.
K3S Upgrade Preparation
The preparation for a Kubernetes cluster depends very much on the Kubernetes platform or distribution, the cluster size, and the workloads that you host. In my case, my private cluster consists of 3 nodes and 2 applications and is based on k3S, a lightweight distribution. I have written about K3S in previous articles, such as K3S introduction or K3S installation tutorial. K3S should make upgrading Nodes easy because all Kubernetes components are bundled into one binary, which, when changed, will upgrade all K8S Components in one step. However, you need to be aware of version changes in the Kubernetes components themselves, and consider if they are still compatible with your application configuration, the YAML manifest files that represent deployments, endpoints, and ingress definitions. This is especially true when you work with complex HELM charts - they are very likely to brake between Kubernetes upgrades, which means you should upgrade them first.
With this in mind, here are the things to look for:
- Kubernetes Distribution: Read the upgrade process and requirements of your distribution carefully, and consider which of this might impact you. For the K3S upgrade plan, I see no obstacles.
- K8S manifest files: Check if upgrading to newer versions will change the
apiVersion
field in the YAML resource manifests, and if/or structural changes happen. There is a concise API deprecation guide that lists the changes. Read it to understand the changes, and update the manifests accordingly. Upgrading manifests can be automated to a certain degree withkubectl convert
see docs - Helm Charts: Read your Helm Charts documentation to identify potential version upgrade problems. The typical solution is to upgrade the Helm chart to a compatible Kubernetes version first before upgrading Kubernetes itself. In some cases, it might be the other way around.
Ok, it’s time to start. For all of the following, keep in mind that I could allow the cluster to be not available for some time, which might not be the case for your upgrade journey.
Step 1: Upgrading Kubernetes from 1.17 => 1.19
The first step is only a small upgrade of two minor versions.
Backup All Manifest Files
Let’s start by creating a complete YAML manifest backup of all resources with the following command:
kb get all -A -o yaml > all.yaml
And then see the currently installed versions:
kb get nodes
NAME STATUS ROLES AGE VERSION
k3s-server Ready master 2y122d v1.17.2+k3s1
k3s-node1 Ready <none> 2y122d v1.17.2+k3s1
k3s-node2 Ready <none> 2y122d v1.17.2+k3s1
Upgrade to Kubernetes v1.18
To identify the very next available minor version, the K3S Github releases page is the best source. Search for the next best version tag, which in my case s [v1.18.8+k3s1], and apply it as shown:
# server
curl -sfL <https://get.k3s.io> | INSTALL_K3S_CHANNEL=v1.18.8+k3s1 sh -
# on each node
curl -sfL <https://get.k3s.io> | INSTALL_K3S_CHANNEL=v1.18.8+k3s1 K3S_TOKEN=$SECRET sh -
The installation logs show no errors at all:
$> curl -sfL <https://get.k3s.io> | INSTALL_K3S_CHANNEL=v1.18.8+k3s1 sh -
[INFO] Finding release for channel v1.18.8+k3s1
[INFO] Using v1.18.8+k3s1 as release
[INFO] Downloading hash <https://github.com/k3s-io/k3s/releases/download/v1.18.8+k3s1/sha256sum-amd64.txt>
[INFO] Downloading binary <https://github.com/k3s-io/k3s/releases/download/v1.18.8+k3s1/k3s>
[INFO] Verifying binary download
[INFO] Installing k3s to /usr/local/bin/k3s
[INFO] Skipping installation of SELinux RPM
[INFO] Skipping /usr/local/bin/kubectl symlink to k3s, already exists
[INFO] Skipping /usr/local/bin/crictl symlink to k3s, already exists
[INFO] Skipping /usr/local/bin/ctr symlink to k3s, already exists
[INFO] Creating killall script /usr/local/bin/k3s-killall.sh
[INFO] Creating uninstall script /usr/local/bin/k3s-uninstall.sh
[INFO] env: Creating environment file /etc/systemd/system/k3s.service.env
[INFO] systemd: Creating service file /etc/systemd/system/k3s.service
[INFO] systemd: Enabling k3s unit
Created symlink /etc/systemd/system/multi-user.target.wants/k3s.service → /etc/systemd/system/k3s.service.
[INFO] systemd: Starting k3s
And with kubectl get nodes
, everything looks fine as well:
NAME STATUS ROLES AGE VERSION
k3s-server Ready master 2y122d v1.18.8+k3s1
k3s-node1 Ready <none> 2y122d v1.18.8+k3s1
k3s-node2 Ready <none> 2y122d v1.18.8+k3s1
Upgrade to Kubernetes v1.19
Continuing to version v1.19.4+k3s2:
# server
curl -sfL <https://get.k3s.io> | INSTALL_K3S_CHANNEL=v1.19.4+k3s2 sh -
# on each node
curl -sfL <https://get.k3s.io> | INSTALL_K3S_CHANNEL=v1.19.4+k3s2 K3S_TOKEN=$SECRET sh -
This time, I ran into an error: node1 and node2 was marked as not ready:
k3s-server Ready master 2y122d v1.19.4+k3s2
k3s-node2 NotReady <none> 2y122d v1.19.4+k3s2
k3s-node1 NotReady <none> 2y122d v1.19.4+k3s2
On node2, I executed these commands:
systemctl stop k3s
systemctl stop k3s-agent
systemctl start k3s-agent
On node1, this did not work. Checking the Kubernetes logfile, I saw this message:
kube-proxy failed to start proxier healthz on 127.0.0.1:10256: listen tcp 127.0.0.1:10256: bind: address already in use
However, this particular error opened a rabbit hole in which I spend too much time trying different things. Finally I simply restarted the node1, and a short time after:
k3s-server Ready master 2y122d v1.19.4+k3s2
k3s-node2 Ready <none> 2y122d v1.19.4+k3s2
k3s-node1 Ready <none> 2y122d v1.19.4+k3s2
Fix the Docker Registry
The private Docker registry hosted inside my Kubernetes cluster was removed during the update. Therefore, I needed to upload all images anew.
Finally, all services were back online:
NAME READY UP-TO-DATE AVAILABLE AGE
docker-registry 1/1 1 1 2y121d
nginx-ingress-controller 1/1 1 1 2y122d
lighthouse-redis 1/1 1 1 481d
lighthouse-scanner 3/3 3 3 2y65d
lighthouse-api 1/1 1 1 2y65d
lighthouse-web 1/1 1 1 2y65d
admantium-blog 1/1 1 1 2y90d
Step 2: Upgrading Kubernetes from 1.19 => 1.21
For the next upgrade, I decide to use the very same approach: Grab the latest patch release of the next minor version, and apply it.
Updgrade to Kubernetes v1.20
Applying version v1.20.15:
# server
curl -sfL https://get.k3s.io | INSTALL_K3S_CHANNEL=v1.20.15+k3s1 sh -
# on each node
curl -sfL https://get.k3s.io | INSTALL_K3S_CHANNEL=v1.20.15+k3s1 K3S_TOKEN=$SECRET sh -
The update was smooth, but node01 was reported as not-ready, although I could connect via SSH normally. I saw several processes consuming an excessive amount of CPU. After killing them, the node became available.
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 2m43s kube-proxy Starting kube-proxy.
Normal Starting 2m43s kubelet Starting kubelet.
Warning InvalidDiskCapacity 2m43s kubelet invalid capacity 0 on image filesystem
Normal NodeHasSufficientMemory 2m43s kubelet Node k3s-node1 status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 2m43s kubelet Node k3s-node1 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 2m43s kubelet Node k3s-node1 status is now: NodeHasSufficientPID
Normal NodeNotReady 2m43s kubelet Node k3s-node1 status is now: NodeNotReady
Normal NodeAllocatableEnforced 2m42s kubelet Updated Node Allocatable limit across pods
Normal NodeReady 51s kubelet Node k3s-node1 status is now: NodeReady
Upgrade to Kubernetes v1.21
Let’s continue with v1.21.8+k3s2,
# server
curl -sfL https://get.k3s.io | INSTALL_K3S_CHANNEL=v1.21.8+k3s2 sh -
# on each node
curl -sfL https://get.k3s.io | INSTALL_K3S_CHANNEL=v1.21.8+k3s2 K3S_TOKEN=$SECRET sh -
All nodes were ready, but one of the application deployments did not work anymore.
NAME READY UP-TO-DATE AVAILABLE AGE
docker-registry 1/1 1 1 2y121d
nginx-ingress-controller 1/1 1 1 2y123d
lighthouse-redis 1/1 1 1 482d
lighthouse-web 1/1 1 1 2y66d
admantium-blog 1/1 1 1 2y90d
lighthouse-scanner 3/3 3 3 2y66d
lighthouse-api 0/1 1 0 2y66d
Fix AppArmor Error
One container, the lighthouse-api, did not start because of this error:
Error: failed to create containerd container: get apparmor_parser version: exec: "apparmor_parser": executable file not found in $PATH
This apparmor related error occurs in k3s v1.21, and this bug ticket provided the solution: On each node, run the following commands, then reboot the node:
apt install apparmor apparmor-utils
All applications are running again.
NAME READY UP-TO-DATE AVAILABLE AGE
lighthouse-redis 1/1 1 1 482d
docker-registry 1/1 1 1 2y122d
admantium-blog 1/1 1 1 2y91d
lighthouse-api 1/1 1 1 2y66d
lighthouse-web 1/1 1 1 2y66d
lighthouse-scanner 3/3 3 3 2y66d
Step 3: Updating Manifests for v1.22
When moving towards v1.22
, several API manifests are expected tp change:
- Ingress: Use the new api version
networking.k8s.io/v1
, change the structure ofbackend
,servicName
,servicePort
, and add apathType
annotation - Certificate Signing Requests: Use the new api version
certificates.k8s.io/v1
, and change the signing clients - RBAC: Use API version
rbac.authorization.k8s.io/v1
I definitely need to update the Ingress resources, otherwise my blog and lighthouse service are not working anymore. Let’s update the Ingress definitions first using kubectl convert
. As an example, let’s take the ingress definition for the blog.
The pre v1.22 spec is this:
kind: Ingress
metadata:
annotations:
cert-manager.io/cluster-issuer: admantium-blog-cert
kubernetes.io/ingress.class: nginx
spec:
rules:
- host: admantium.com
http:
paths:
- backend:
serviceName: admantium-blog
servicePort: 8080
path: /
pathType: ImplementationSpecific
tls:
- hosts:
- admantium.com
secretName: admantium-blog-cert
status:
loadBalancer:
ingress:
- ip: 49.12.45.6
And after:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
cert-manager.io/cluster-issuer: admantium-blog-cert
kubernetes.io/ingress.class: nginx
spec:
rules:
- host: admantium.com
http:
paths:
- backend:
service:
name: admantium-blog
port:
number: 8080
path: /
pathType: ImplementationSpecific
tls:
- hosts:
- admantium.com
secretName: admantium-blog-cert
status:
loadBalancer:
ingress:
- ip: 49.12.45.6
By running kb convert -f blog.yaml --output-version networking.k8s.io/v1 > blog_new.yaml && kb apply -f blog_new.yaml
, the ingress resource was updated successfully. I did similar commands for the other ingresses too, and then continued.
Step 4: Upgrading Helm Charts for v1.22
During the initial setup of my cluster, I used the tool arkade, see also my 2020-08-03 blog post. This tool packages essential Kubernetes helm charts with a simple installer. Using the helm binary directly, I see these installed releases:
cert-manager cert-manager 5 2022-09-17 12:38:54.128091774 +0200 CEST deployed cert-manager-v0.14.3 v0.14.3
docker-registry default 2 2020-04-26 19:29:33.850171 +0200 CEST deployed docker-registry-1.9.2 2.7.1
ingress-nginx default 1 2022-08-27 12:32:50.667457391 +0200 CEST deployed ingress-nginx-4.2.3 1.3.0
And these registries:
helm repo list
NAME URL
ingress-nginx https://kubernetes.github.io/ingress-nginx
jetstack https://charts.jetstack.io
All of these need to be updated.
Ingress Nginx
During an earlier update, I installed ingress-nginx
, assuming it would replace the nginx-ingress
release. But because of naming differences, this resulted in two separate installations:
helm list --all-namespaces
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
cert-manager cert-manager 3 2020-04-27 19:58:05.144256 +0200 CEST deployed cert-manager-v0.12.0 v0.12.0
docker-registry default 2 2020-04-26 19:29:33.850171 +0200 CEST deployed docker-registry-1.9.2 2.7.1
ingress-nginx default 1 2022-08-27 10:27:22.303325075 +0200 CEST failed ingress-nginx-4.2.3 1.3.0
nginx-ingress default 2 2020-05-08 14:11:09.757913 +0200 CEST deployed nginx-ingress-1.36.3 0.30.0
traefik kube-system 3 2022-08-26 16:01:19.368774236 +0000 UTC deployed traefik-1.81.001 1.7.19
The solution was to cleanly uninstall and reinstall the components:
helm delete nginx-ingress
helm delete ingress-nginx
helm install ingress-nginx
Then the deployment was running again:
helm search repo ingress-nginx -l
NAME CHART VERSION APP VERSION DESCRIPTION
ingress-nginx/ingress-nginx 4.2.3 1.3.0 Ingress controller for Kubernetes using NGINX a...
Cert Manager
For the certificates, I 'm using cert manager. On its release information page, I could see that my installed version supports Kubernetes up to v1.21. The upgrade notes for cert manager are in sync with the same practice as upgrading Kubernetes: One minor version at a time, and using the highest available patch version.
Upgrade to v0.14 and Fix CRD Error
The first upgrade resulted in an error:
kubectl delete -n cert-manager deployment cert-manager cert-manager-cainjector cert-manager-webhook
helm upgrade --set installCRDs=true --version 0.14 cert-manager jetstack/cert-manager --namespace=cert-manager
Error: UPGRADE FAILED: cannot patch "cert-manager-cainjector" with kind Deployment: Deployment.apps "cert-manager-cainjector" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/component":"cainjector", "app.kubernetes.io/instance":"cert-manager", "app.kubernetes.io/name":"cainjector"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable && cannot patch "cert-manager" with kind Deployment: Deployment.apps "cert-manager" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/component":"controller", "app.kubernetes.io/instance":"cert-manager", "app.kubernetes.io/name":"cert-manager"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable && cannot patch "cert-manager-webhook" with kind Deployment: Deployment.apps "cert-manager-webhook" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/component":"webhook", "app.kubernetes.io/instance":"cert-manager", "app.kubernetes.io/name":"webhook"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable
I found the solution in this cert-manager issue: Manually uninstall outdated CRDs, then perform the upgrade.
for i in certificates.cert-manager.io challenges.acme.cert-manager.io clusterissuers.cert-manager.io issuers.cert-manager.io orders.acme.cert-manager.io
do
k delete crd $i
done
helm upgrade --set installCRDs=true --version 0.14 cert-manager jetstack/cert-manager --namespace=cert-manager
...
NAME: cert-manager
LAST DEPLOYED: Sat Sep 17 12:38:54 2022
NAMESPACE: cert-manager
STATUS: deployed
REVISION: 5
TEST SUITE: None
NOTES:
cert-manager has been deployed successfully!
Upgrade to v1.9
Continuing all the way from 0.14 to 1.9 was unproblematic. After each upgrade, I checked the Kubernetes event messages. A typical printout was this:
kube-system 0s Normal LeaderElection lease/cert-manager-controller cert-manager-7b8d75c477-rmtgw-external-cert-manager-controller became leader
kube-system 0s Normal LeaderElection lease/cert-manager-cainjector-leader-election cert-manager-cainjector-6cd8d7f84b-tc2vn_649414c2-b9cb-4ace-af6f-8feaa5a0f06b became leader
I was especially delighted about this message:
default 0s Normal CreateCertificate ingress/lighthouse Successfully created Certificate "lighthouse-cert"
default 0s Normal CreateCertificate ingress/blog Successfully created Certificate "admantium-blog-cert"
default 0s Normal CreateCertificate ingress/docker-registry Successfully created Certificate "docker-registry"
Finally the most recent version is used:
helm list -A
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
cert-manager cert-manager 21 2022-09-18 10:35:16.958630764 +0200 CEST deployed cert-manager-v1.9.1 v1.9.1
Docker-Registry
The docker-registry
that was installed with the tool arkade stems originally from the Github repo helm/charts. This repo was deprecated with announcement, and moved to a new Repo
The question now is: Can I update the helm chart from this new repo, or do I need to install from scratch? The release page lists the most oldest version as 1.9.7. Let’s try an upgrade with the --dry-run
option.
helm upgrade --version 1.9.7 docker-registry docker-registry/docker-registry --namespace=default --dry-run
...
NAME: docker-registry
LAST DEPLOYED: Sun Sep 18 11:19:47 2022
NAMESPACE: default
STATUS: pending-upgrade
REVISION: 3
TEST SUITE: None
...
This looks good! All printed manifests file is similar to the current ones. Let’s upgrade, and then try a docker push command:
helm upgrade --version 1.9.7 docker-registry docker-registry/docker-registry --namespace=default --dry-run
docker push docker.admantium.com/lighthouse-web:0.4.2
The push refers to repository [docker.admantium.com/lighthouse-web]
f1a5039ecf29: Pushed
221ee9f09112: Pushed
70d0aad4ac8b: Pushed
b539cf60d7bb: Pushed
bdc7a32279cc: Pushed
This went well! In the same manner as before, I upgraded one minor version and tried the docker push
command. The only notable release info is the upgrade from 1.16.0 to 2.0.0 which added the ingress.spec.ingressClassName
field so that the ingress resource works as before.
Helm Chart Upgrades Completed
Yes!
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
cert-manager cert-manager 21 2022-09-18 10:35:16.958630764 +0200 CEST deployed cert-manager-v1.9.1 v1.9.1
docker-registry default 14 2022-09-18 11:35:49.363746406 +0200 CEST deployed docker-registry-2.2.2 2.8.1
ingress-nginx default 1 2022-08-27 12:32:50.667457391 +0200 CEST deployed ingress-nginx-4.2.3 1.3.0
Step 5: Upgrading Kubernetes from 1.21 => v1.24
For the final Kubernetes version upgrades, I used an even better the more comprehensive changelog from the Kubernetes GitHub repository. As before, I determined the most recent patch version of the next minor version to install, and continued.
Upgrading to Kubernetes v1.21
The upgrade command:
# server
curl -sfL https://get.k3s.io | INSTALL_K3S_CHANNEL=v1.21.14+k3s1 sh -
# on each node
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="agent" INSTALL_K3S_CHANNEL=v1.21.14+k3s1 K3S_TOKEN=$SECRET sh -
Again, the worker nodes were not immediately available. After manually stopping and staring k3s
on the nodes, the update was successful:
systemctl stop k3s
systemctl stop k3s-agent
systemctl start k3s-agent
Upgrading to Kubernetes v1.22
The upgrade to v1.22 was not so smooth. From this version on, K3S defaults to install traefik as the ingress manager via a Helm chart. This disrupted the pod communication. I uninstalled traefik, and then added a specific flag to the upgrade command, like shown:
# server
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--no-deploy traefik --disable-network-policy" INSTALL_K3S_CHANNEL=v1.22.13+k3s1 sh -
# on each node
curl -sfL https://get.k3s.io | INSTALL_K3S_CHANNEL=v1.22.13+k3s1 K3S_URL=https://49.12.45.6:6443 K3S_TOKEN=$SECRET sh -
All applications worked.
Upgrading to Kubernetes v1.23
The most recent version is v1.23.10.
# server
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--no-deploy traefik --disable-network-policy" INSTALL_K3S_CHANNEL=v1.23.10+k3s1 sh -
# on each node
curl -sfL https://get.k3s.io | INSTALL_K3S_CHANNEL=v1.23.10+k3s1 K3S_URL=https://49.12.45.6:6443 K3S_TOKEN=$SECRET sh -
All applications worked.
NAME STATUS ROLES AGE VERSION
k3s-node2 Ready <none> 2y151d v1.23.10+k3s1
k3s-server Ready control-plane,master 2y151d v1.23.10+k3s1
k3s-node1 Ready <none> 2y151d v1.23.10+k3s1
Upgrading to Kubernetes v1.24
Upgrade to v1.24.4+k3s1.
# server
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--no-deploy traefik --disable-network-policy" INSTALL_K3S_CHANNEL=v1.24.4+k3s1 sh -
# on each node
curl -sfL https://get.k3s.io | INSTALL_K3S_CHANNEL=v1.24.4+k3s1 K3S_URL=https://49.12.45.6:6443 K3S_TOKEN=$SECRET sh -
This one was also very smooth, even no node restart was required.
k3s-server Ready control-plane,master 2y151d v1.24.4+k3s1
k3s-node1 Ready <none> 2y151d v1.24.4+k3s1
k3s-node2 Ready <none> 2y151d v1.24.4+k3s1
Upgrading to Kubernetes v1.25
The final upgrade is at hand:
# server
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--no-deploy traefik --disable-network-policy" INSTALL_K3S_CHANNEL=v1.25.5+k3s1 sh -
# on each node
curl -sfL https://get.k3s.io | INSTALL_K3S_CHANNEL=v1.25.5+k3s1 K3S_URL=https://49.12.45.6:6443 K3S_TOKEN=$SECRET sh -
All services work. The upgrade is finished.
Conclusion
Upgrading a Kubernetes cluster to its most recent version can be an intimidating task. In this article, you saw a practical example for upgrading a K3S cluster from v1.17 to v1.25. During the update process, I encountered some errors, and solved them as follows a) If a node is not available, restart the k3s
binary or the complete node, b) If pod communication or incoming Ingress traffic is disrupted, check the ingress configuration and which ingress solution is installed via Helm, c) on Debian systems, be sure that apparmor
and apparmor-utils
are installed. In general, you update Kubernetes one minor version at a time, and you need to check three different things: a) Updates to the Kubernetes API, b) Update to Kubernetes manifests, c) Updates of the Helm chart. I'm happy that the updates were successful, and that at the time of writing I had a complete up-to-cluster.