Kubernetes: Defining a CronJob for Collecting Log Data

By Sebastian Günther

—

7th September, 2020

—

This is the final article about my Kube Log Exporter series. The first article introduced the basic design and how to execute the log exporter from your machine that runs the kubectl commands. The second article explained the necessary ServiceAccounts, ClusterRole and ClusterRoleBinding resources that we need to run the log exporter inside the cluster. And now in this article I explain how to define a cron job that runs the Kube Log Exporter automatically and regularly scheduled.

Kubernetes CronJobs

A CronJob is in its basic specification similar to a Deployment or ReplicaSet: It defines a set of containers that run, and it defines its run conditions. These conditions are what makes a CronJob a unique Kubernetes resource. Let's briefly discuss the most important ones:

scheduleTime - Uses the Linux crontab syntax to precisly define the exact, day, hour, minute and second when the job should run.
completions - The number of successful completions that need to be achieved before this cron job is considered successful. Potential use cases are jobs that optimize storage in a database, or clean up files and which may have a different set of success criteria.
parallelism - Controls if the the cron job can run in multiple parallel jobs or only sequentially
activeDeadlineSeconds - the maximum timespan in which the batch job needs to be finished. If it reaches this limit, the Kublet scheduler stops the cron job and considers it failed

There are many more options available, so take a look at the official Kubernetes documentation as well.

KubeLogExporter CronJob

The KubeLogExporter cron job that I'm using needs to fulfil the following requirements:

It needs to run every hour
It needs to be successful, e.g. no error when reading or storing a log occurs
It needs the use the service account that we discussed in the last article to have the proper access rights to namespaces, pods and logs
It needs to use a persistent volume to store the log files, so that independent of the actual node where the job runs the same log files are amended

Let’s develop the CronJob resource definition bit by bit. The first part fulfils the scheduling requirements. The schedule is defines at spec.schedule to run each hour. The job will need to be completed exactly one time (spec.jobTemplate.spec.completions) and will restart in case of an error (spec.jobTemplate.spec.restartPolicy).

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: kube-log-exporter-cron
spec:
  schedule: "0 * * * *"
  jobTemplate:
    spec:
      completions: 1
      template:
        spec:
          restartPolicy: OnError
          ...

The declaration of the ServiceAccount is very simple: We add spec.jobTemplate.spec.serviceAccountName.

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: kube-log-exporter-cron
spec:
  schedule: "0 * * * *"
  jobTemplate:
    spec:
      completions: 1
      template:
        spec:
          restartPolicy: OnError
          serviceAccountName: log-exporter-sa
          ...

Now we need to add the persistent volume declaration. The volume needs to be mounted only by one process, and I reserve 1Gi size for it.

kind: PersistentVolumeClaim
metadata:
  name: log-exporter-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi

In the Kubernetes distribution of my choice, K3S, persistent volumes are create automatically when the PersistentVolumeClaim is defined. In another distribution you also need the setup the PersistentVolume, but this is not in the focus of this article.

Now we use the PersistentVolumeClaim to define a volume in spec.jobTemplate.spec.volumes, and then reference this volume as a mounted volume in the container at spec.jobTemplate.spec.containers.volumeMounts.

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: kube-log-exporter-cron
spec:
  schedule: '0 * * * *'
  jobTemplate:
    spec:
      completions: 1
      template:
        spec:
          serviceAccountName: log-exporter-sa
          containers:
            - name: kube-log-exporter
              image: docker.admantium.com/kube-log-exporter:0.1.9.12
              volumeMounts:
                - name: log-exporter-volume
                  mountPath: /etc/kube-log-exporter/logs
          restartPolicy: Never
          volumes:
            - name: log-exporter-volume
              persistentVolumeClaim:
                claimName: log-exporter-pvc

Executing the CronJob

Now we create the cron job with kubectl create -f kube-log-exporter-cron-job.yaml. Once the job runs (for testing purposes you can also run the job every minute with schedule: /1 ** **), we can see the job history.

kb describe cronjob kube-log-exporter   130 ↵
Name:                          kube-log-exporter-cron
Namespace:                     default
Labels:                        <none>
Annotations:                   Schedule:  0 * * * *

...

Last Schedule Time:  Mon, 25 Aug 2020 19:00:00 +0200
Active Jobs:         <none>
Events:
  Type    Reason            Age   From                Message
  ----    ------            ----  ----                -------
  Normal  SuccessfulCreate  12m   cronjob-controller  Created job kube-log-exporter-cron-1590426000
  Normal  SawCompletedJob   12m   cronjob-controller  Saw completed job: kube-log-exporter-cron-1590426000, status: Complete
  Normal  SuccessfulDelete  12m   cronjob-controller  Deleted job kube-log-exporter-cron-1590415200

And here is an example of the created logfiles.

``sh

ls -la /etc/kube-log-exporter/logs

-rw-r--r-- 1 root root 4515 Aug 25 19:00 lighthouse-78cc7475c7-74ctt_lighthouse.log -rw-r--r-- 1 root root 6012 Aug 25 19:00 lighthouse-78cc7475c7-gcl94_lighthouse.log -rw-r--r-- 1 root root 6873 Aug 25 19:00 lighthouse-78cc7475c7-k2cv7_lighthouse.log -rw-r--r-- 1 root root 7634 Aug 25 19:00 lighthouse-78cc7475c7-l7zpv_lighthouse.log -rw-r--r-- 1 root root 4636 Aug 25 19:00 lighthouse-78cc7475c7-wh2gk_lighthouse.log -rw-r--r-- 1 root root 25741 Aug 25 19:00 redis-6b746f4d9b-8tjds_redis.log ....


```sh
> cat /etc/kube-log-exporter/logs/redis-6b746f4d9b-8tjds_redis.log

1:C 25 Aug 2020 16:21:04.675 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 25 Aug 2020 16:21:04.675 # Redis version=6.0.1, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 25 Aug 2020 16:21:04.675 # Configuration loaded
                _._
           _.-``__ ''-._
      _.-``    `.  `_.  ''-._           Redis 6.0.1 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._
 (    '      ,       .-`  | `,    )     Running in standalone mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 7139
 |    `-._   `._    /     _.-'    |     PID: 1
  `-._    `-._  `-./  _.-'    _.-'
 |`-._`-._    `-.__.-'    _.-'_.-'|
 |    `-._`-._        _.-'_.-'    |           http://redis.io
  `-._    `-._`-.__.-'_.-'    _.-'
 |    `-._`-._        _.-'_.-'    |
      `-._    `-.__.-'    _.-'
          `-._        _.-'
              `-.__.-'

1:M 25 Aug 2020 16:21:04.678 # Server initialized

CronJob: Complete Resource Defintion

Here is the complete version again.

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: kube-log-exporter-cron
spec:
  schedule: '0 * * * *'
  jobTemplate:
    spec:
      completions: 1
      template:
        spec:
          serviceAccountName: log-exporter-sa
          containers:
            - name: kube-log-exporter
              image: docker.admantium.com/kube-log-exporter:0.1.9.12
              args: ['node', 'cluster.js']
              volumeMounts:
                - name: log-exporter-volume
                  mountPath: /etc/kube-log-exporter/logs
          restartPolicy: Never
          volumes:
            - name: log-exporter-volume
              persistentVolumeClaim:
                claimName: log-exporter-pvc
          imagePullSecrets:
            - name: registry-secret

Conclusion

Kubernetes CronJob define periodic scheduled tasks in your cluster. Typical use cases are maintenance tasks such as leaning files, updating index or collect data. If you want to store log data in plain files, then using a cron job is a straightforward solution. This article showed how to define a cron job that uses the KubeLogExplorer to persists Pod log data in files as a persistent volume.

Previous: Kubernetes API: How Custom Service Accounts Work

Next: Ansible: Working with Variables and Hostvars