Kubernetes Pod Security and Probes

How to debug networking policies on GKE clusters.

I was doing some training and was asked if the pod security policy affects the efficacy of the liveness probes. My kneejerk reaction was no, but we had some time so we broke down the pieces and tested it out. There is a bit around Kubernetes that is not fully explained so I want to write about the process not because my kneejerk answer was right but because sometimes to understand things it is useful to take things apart (to Mom: sorry about the clock radio). I’d also like to thank Hiten Pandya for asking the question and working with me in going through this in class. It was GCP training, so this assumes you have a project and billing account to test this if you’d like to try it out.

Kubernetes has the capability of firewalling traffic between pods which is really useful to isolate traffic in a multi-tier architecture. It is a wrapper around iptables and it allows pod, namespace and IP definitions. namespace 1` can’t communicate to namespace 2 and database pod can’t get traffic from the internet.

Now, you can set default policies. So as a good security practice is to block everything and then allow specific traffic only. The question was how do you specifically allow the liveness probes (is the pod running?) to go through to the pod if there is a default deny inbound policy?

So lets start with a cluster with network-policy enabled:

export my_zone=europe-west1-d
export my_cluster=demo-cluster
export project_id=$DEVSHELL_PROJECT_ID
source <(kubectl completion bash)# Create a cluster with netork policy enabled
gcloud container clusters create $my_cluster \
--machine-type e2-medium --num-nodes 2 --enable-ip-alias \
--enable-network-policy --zone $my_zone \
gcloud container clusters get-credentials $my_cluster --zone $my_zone# Rename context, yes I could use kubectx but bash scripting ftwkubectl config rename-context $(kubectl config get-contexts | awk '{ print $2 }' | grep demo-cluster) demo-cluster

We need a simple app to test this, so a simple flask app that prints something when a request comes in is good enough. Create a sample application that listens on port 80 using the below YAMLand save as temp/app/app.py .

import socket
from flask import Flask
app = Flask(__name__)
@app.route("/")
def hello():
hostname = socket.gethostname()
return "Built Locally and Pushed\n from: {}".format(hostname)
@app.route("/version")
def version():
return "Helloworld 1.0\n"
if __name__ == "__main__":
app.run(host='0.0.0.0',port=80)

Add some requirements and save as temp/app/requirements.txt

Flask==0.12
uwsgi==2.0.15

Create a simple Dockerfile and save as temp/app/Dockerfile . We need python3 and the ping utility.

FROM ubuntu:18.04
RUN apt-get update -y && \
apt-get install -y python3-pip python3-dev iputils-ping
COPY requirements.txt /app/requirements.txt
WORKDIR /app
RUN pip3 install -r requirements.txt
COPY app.py /app
ENTRYPOINT ["python3", "app.py"]

Build the sample code and store the container in the container registry so we can easily deploy it to our GKE cluster.

gcloud builds submit --tag gcr.io/$project_id/pyserver:v1 temp/app/.

Deploy the sample application onto the cluster using the below manifest. app.yaml . and deploy to your cluster. Make sure you adjust my-project to your project id.

apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: pyserver
version: v1
name: pyserver
spec:
strategy:
rollingUpdate:
maxSurge: 5
maxUnavailable: 0
type: RollingUpdate
replicas: 2
selector:
matchLabels:
app: pyserver
version: v1
template:
metadata:
labels:
app: pyserver
version: v1
spec:
containers:
- image: gcr.io/my-project/pyserver:v1
imagePullPolicy: Always
name: pyserver
livenessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 5
periodSeconds: 5

Get some info around the cluster networking. As we can see we have our 2 pods in the deployment and we have the three nodes. The nodes are running Container Optimized OS which is a very bare bones system.

kubectl get pods -o=wideNAME                       READY   STATUS    RESTARTS   AGE   IP           NODE                                          NOMINATED NODE   READINESS GATES
pyserver-b58df9f76-7khqc 1/1 Running 0 26m 10.112.1.8 gke-demo-cluster-default-pool-5501bfce-hr9w <none> <none>
pyserver-b58df9f76-jvqfg 1/1 Running 0 26m 10.112.1.9 gke-demo-cluster-default-pool-5501bfce-hr9w <none> <none>
kubectl get nodes -o=wideNAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
gke-demo-cluster-default-pool-5501bfce-hr9w Ready <none> 87m v1.16.13-gke.401 10.132.0.25 35.233.66.17 Container-Optimized OS from Google 4.19.112+ docker://19.3.1
gke-demo-cluster-default-pool-5501bfce-nqmh Ready <none> 94m v1.16.13-gke.401 10.132.0.23 34.76.235.52 Container-Optimized OS from Google 4.19.112+ docker://19.3.1
gke-demo-cluster-default-pool-5501bfce-vwvd Ready <none> 90m v1.16.13-gke.401 10.132.0.24 104.155.45.33 Container-Optimized OS from Google 4.19.112+ docker://19.3.1

We then need to check the pods logs to confirm the liveness probes are getting to the pod. The flask app logs to STDOUT which GKE automatically forward to Cloud Operations Logging so it's easy to track

Cloud Operations should have the container log entries showing the liveness probe GETs. One line is as follows, we are looking for the HTTP 200 response from flask:

{
"textPayload": "10.132.0.25 - - [20/Nov/2020 19:00:43] \"\u001b[37mGET / HTTP/1.1\u001b[0m\" 200 -\n",
"insertId": "kdkyd4x3qbhpevh5x",
"resource": {
"type": "k8s_container",
"labels": {
"cluster_name": "demo-cluster",
"namespace_name": "default",
"container_name": "pyserver",
"pod_name": "pyserver-b58df9f76-jvqfg",
"location": "europe-west1-d",
"project_id": "my-project"
}
},
"timestamp": "2020-11-20T19:00:43.786834983Z",
"severity": "ERROR",
"labels": {
"k8s-pod/app": "pyserver",
"k8s-pod/version": "v1",
"k8s-pod/pod-template-hash": "b58df9f76"
},
"logName": "projects/my-project/logs/stderr",
"receiveTimestamp": "2020-11-20T19:00:49.512538744Z"
}

The first cool thing to notice is that the liveness probe originates from the node IP which you can verify against the previous get nodes command. Now we can log into the container and check network connectivity. Log into the container and check that it can ping out. Adjust your pod name to one of your pods as the unique name will be different on your side.

kubectl exec -i -t -n default pyserver-b58df9f76-7khqc -c pyserver "--" sh -c "clear; (bash || ash || sh)"

Inside the container ping 8.8.8.8 and CTRL-C when satisfied that it can.

ping 8.8.8.8

We then enforce the default deny all policy in the default namespace using the following manifest, it will default to the default namespace where our pod is, these things you may want to explicitly namespace to avoid applying the policy to a service you may break by enforcing it:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-ingress
spec:
podSelector: {}
policyTypes:
- Ingress

---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-egress
spec:
podSelector: {}
policyTypes:
- Egress

Inside the container ping 8.8.8.8 and CTRL-C when satisfied that it can't.

ping 8.8.8.8

Let's check from the node that contains the pod. we'll install the toolbox onto the node so we can debug the connection. This means that we can emulate the traffic to the pod from the same IP address as the probe. Using the toolbox requires the capability to SSH into the node, so if you want to prevent that make sure you remove the permission on the GKE service account (default compute service account by default).

paul@gke-demo-cluster-default-pool-5501bfce-hr9w ~ $ sudo toolbox
20200603-00: Pulling from google-containers/toolbox
1c6172af85ee: Pull complete
a4b5cec33934: Pull complete
b7417d4f55be: Pull complete
fed60196983f: Pull complete
8e1533dfae69: Pull complete
112bf8e3d384: Pull complete
1df10c12cc15: Pull complete
b33e020bb38a: Pull complete
938e6be48196: Pull complete
Digest: sha256:36e2f6b8aa40328453aed7917860a8dee746c101dfde4464ce173ed402c1ec57
Status: Downloaded newer image for gcr.io/google-containers/toolbox:20200603-00
gcr.io/google-containers/toolbox:20200603-00
ff823c92b3d85269afd157d536a7847897b8fd1e9f4d74c1f6f6ee3d95568b0e
root-gcr.io_google-containers_toolbox-20200603-00
Please do not use --share-system anymore, use $SYSTEMD_NSPAWN_SHARE_* instead.
Spawning container root-gcr.io_google-containers_toolbox-20200603-00 on /var/lib/toolbox/root-gcr.io_google-containers_toolbox-20200603-00.
Press ^] three times within 1s to kill container.

Lets check what IP addresses we have on the node:

root@gke-demo-cluster-default-pool-5501bfce-hr9w:~# ifconfig | grep "inet ""
inet 169.254.123.1 netmask 255.255.255.0 broadcast 169.254.123.255
inet 10.132.0.25 netmask 255.255.255.255 broadcast 0.0.0.0
inet 127.0.0.1 netmask 255.0.0.0

From inside the node we'll get the correct response from the pod. Liveness probe is still working with the network policy in place.

root@gke-demo-cluster-default-pool-5501bfce-hr9w:~# curl 10.112.1.8
Built Locally and Pushed

So, as you can see the liveness probe is not affected by the network policy and there are some cool tools to help debug applications on the GKE platform.

Senior Google Cloud Trainer