Kubernetes Audits Introduction

Estimated reading time: 6 Minutes

Monitoring the security aspects of a system as complex as Kubernetes can get frustrating. Especially when you want simple answers to simple questions (e.g., what happened? when did it happen?). That is exactly where Kubernetes audits come into place.

In this blog post I will go over Kubernetes audits in detail: What are they exactly? how are they constructed? where do they originate? how and to where do they go?

What are Kubernetes audits?

Think of it as security context logs for Kubernetes. These audits can give you significant insight to what goes on behind the scenes of your cluster(s).

At the end of the day a Kubernetes audit is a simple structured JSON object. It contains entries that would help you answer simple (or complex) questions, similar to the ones mentioned above, and others such as – who caused an event?.

The following is an example of a Kubernetes event. The full event structure fields and descriptions are available in the Kubernetes GitHub repository, under the audits V1 API.

{
   "kind":"Event",
   "apiVersion":"audit.k8s.io/v1",
   "level":"Metadata",
   "auditID":"5eae1f38-eb8b-4054-8191-5ecfce66bde3",
   "stage":"ResponseComplete",
   "requestURI":"/api/v1/namespaces/default/pods/busybox-sleep/status",
   "verb":"patch",
   "user":{
      "username":"system:node:name",
      "groups":["system:nodes", "system:authenticated"]
   },
   "sourceIPs":["10.0.2.15"],
   "userAgent":"kubelet/v1.18.8 (linux/amd64) kubernetes/9f2892a",
   "objectRef":{
      "resource":"pods",
      "namespace":"default",
      "name":"busybox-sleep",
      "apiVersion":"v1",
      "subresource":"status"
   },
   "responseStatus":{
      "metadata":{},
      "code":200
   },
   "requestReceivedTimestamp":"2020-08-25T05:42:22.572271Z",
   "stageTimestamp":"2020-08-25T05:42:22.574711Z",
   "annotations":{
      "authorization.k8s.io/decision":"allow",
      "authorization.k8s.io/reason":""
   }
}

  • What happened? – you can start with the verb entry (line #8). In our case it’s a “patch” action
  • What object is being patched? let’s take a look at the objectRef (line#15). You can see that it contains a “pods” resource
  • When did it happen? requestReceivedTimestamp (line#26) is at your disposal.
  • From where was it initiated? sourceIPs (line#13)
  • an so on…

The API Server

In case you don’t know, the main part of a Kubernetes cluster is the API server. The API server is a component of the Kubernetes control plane that exposes the Kubernetes API. The API server is the front end for the Kubernetes control plane.

Now you can understand why the Kubernetes cluster audits are originating from the API server.

High level look at the Kubernetes control plance

Credit: Kubernetes documentation

The API server decides what to audit by evaluating a policy. This policy tells the API server what the administrator wants to audit and at what level of detail (more on that later).

That is one reason to why Kubernetes audits are disabled by default – the API server cannot decide for itself what data you want to audit. Another reason for it being disabled by default is because the auditing feature increases the memory consumption of the API server. Additionally, memory consumption depends on the audit policy.

The audit policy

When defining a policy, the first thing to understand is when exactly an event is triggered. We now understand that the API server is the main component of the Kubernetes cluster. So, every request that a person or a component makes to the API server is considered as an event.

An event is generated when a request goes through one or more stages:

  1. RequestReceived – The audit handler has received the request.
  2. ResponseStarted – The response headers have been sent, but the response body has not been sent.
  3. ResponseComplete – The response body has been completed, and no more bytes will be sent.
  4. Panic – There was an internal server error, and the request did not complete.

Audit = an event that was recorded

The policy specifies whether an event should be recorded as an audit and if so, what data should it include (that is the level).

The audit policy consists of a list of rules. When the policy is being evaluated, the first rule which matches an event generates an audit and determines the level of detail that the audit should include.

A rule can specify one of these audit levels:

  • None – Do not create a log entry for the event.
  • Metadata – Create a log entry. Include metadata, but do not include the request body or the response body.
  • Request – Create a log entry. Include metadata and the request body, but do not include the response body.
  • RequestResponse – Create a log entry. Include metadata, the request body and the response body.

Before moving to examples, all of the possible policy rules fields and definitions are available here.

Policy examples

The simplest policy would look like this:

apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: Metadata

This policy generates audits on every request at a metadata level.

A bit more complex policy:

apiVersion: audit.k8s.io/v1
kind: Policy
rules:
# Don't log these read-only URLs.
- level: None
nonResourceURLs:
  - /healthz*
  - /version
  - /swagger*

# Secrets, ConfigMaps, and TokenReviews can contain sensitive & binary data,
# so only log at the Metadata level.
- level: Metadata
omitStages:
  - "RequestReceived"
resources:
  - group: "" # core API group
    resources: ["secrets", "configmaps"]
  - group: authentication.k8s.io
    resources: ["tokenreviews"]

# Log changes to pods at RequestResponse level
- level: RequestResponse
  omitStages:
  - "RequestReceived"
  resources:
  - group: "" # core API group
    resources: ["pods"]
    verbs: ["create", "patch", "update", "delete"]

# Fallback - log everything else at Metadata level
- level: Metadata
  omitStages:
  - "RequestReceived"

The full policy fields and descriptions are available in the Kubernetes GitHub repository, under the audits V1 API.

Another great example of a well-defined policy is the Google Kubernetes Engine (GKE) policy generation script.

Please do keep in mind that, as mentioned above, your policy determines the level of memory and CPU consumption of the API server in direct proportions.

Now that we understand how Kubernetes decides what to audit, the missing part is where to output the audits?

Introducing audit backends

Currently Kuberentes supports 2 kinds of backends:

Defining a backend requires you to define flags for the API server to use.

For example, to specify the log file path that the log backend uses to write audit events, use the following flag:

--audit-log-path /path/to/log/file

Or, to specify the path to a file with a webhook configuration for a webhook backend:

--audit-webhook-config-file /path/to/config

There are more configurations allowing you fine tuning of these backends, you can see all of those in the official Kubernetes documentation.

UPDATE: If you want to know how to configure and implement your own webhook backend, see my follow up post.

Limitations

Kubernetes audits have several limitations that in my opinion are worth noting:

Managed clusters

You can-not use these with managed clusters (i.e., GKE, EKS, AKS and so on). Since you don’t have control over the master node, and as a result over the API server, you cannot define the configuring flags and have the policy or backend you wish for.

Instead, these are predefined by the cloud providers. They usually have a good enough policy and the backend is routed to their logging services. For example you can take a look at the GKE policy script and the backend is routed to Operations (formerly Stackdriver).

Managed clusters audits structure

Another point to note regarding managed clusters is that they change the way the audits are constructed, so you should expect a different JSON structure (with more and possibly different fields). For example, you can take a look at the Operations audit structure and the payload structure.

Restarts

A restart to the API server is required every time you change the audit configuration via flags.

Configuration

In my opinion, getting the right policy which generates only the audits you really care about is not that easy. The go-to solution for this issue is using a 3rd party application to aggregate, visualize and alert on specific audits.

Leave a Reply