Kubernetes the hard way on Docker

Bright Zheng
11 min readJul 5, 2020
Kubernetes the hard way on Docker

The Motivation

Being a Certified Kubernetes Administrator (CKA) is a great milestone for any of the Kubernetes professionals.

CKA exam focuses on both the theory and the practical skills. Some Kubernetes practitioners may feel that it’s easy and straightforward, just as one’s daily tasks. But a lot still feel unconfident because fusing theory with hands-on skills has never been as easy as you think.

So keeping practicing is always the key to success.

A lot of people, including me, may remind you, by sharing the experiences, to follow some typical learning paths, like:

  • Read through Kubernetes’ official docs and practice at least twice of the tasks mentioned in the getting started guide, concepts, tasks, and tutorials.
  • Read, understand, and practice Kelsey Hightower’s tutorial Kubernetes The Hard Way for a couple of times;
  • Read some great books like Kubernetes in Action by Marko Luksa
  • And again, practice, practice, practice!

As you can imagine, we need some hands-on environment to build Kubernetes, prove the concepts, and practice the tasks and you will eventually be there.

During the preparation last year, I even compiled Kubernetes the Kubeadm Way, inspired by Kelsey Hightower’s Kubernetes The Hard Way, using kubeadm to walk you through how to set up a highly available Kubernetes cluster, on Google Cloud Platform.

But it’s way too expensive to spin up such a Kubernetes cluster on any of the cloud providers, and you may have to practice more than one time!

That’s why I’m quite interested in those lightweight Kubernetes solutions, like:

  • Kubernetes in Docker, a.k.a. kind, which has become the Kubernetes’ official testing infra
  • Rancher’s K3s by K3d
  • Vagrant + VirtualBox

That was my motivation to compile a repo brightzheng100/kube4dev — Kubernetes for Development.

All these are great and can give us a hands-on environment in minutes or even seconds, running in our laptop, with bare minimum resource consumption — Just one single command like kind create cluster or k3d create --workers 3 -- boom, the cluster is already up and running!

But, what’s the problem then? Obviously, they’re fully automated and sometimes we need the hard way!

That’s the motivation of this post: to build the hard way for building up Kubernetes from scratch, step by step.

The Goal

The goal is to build a highly available Kubernetes cluster on Docker, within our laptop.

Such a cluster should have:

  • 1 * Load Balancer node — actually 2 LB nodes should be considered but as it’s not our focus, let’s go with just one.
  • 3 * Control Plane nodes
  • 2+ * Worker nodes

The OS, Tools, Components, and Versions:

  • Working Laptop: MacBook Pro with macOS Catalina, but I believe it should work wherever Docker runs
  • Docker Desktop with Engine v19.03.8
  • kubeadm v1.18.5, as of writing
  • kubernetes — kubelet, kubectl v1.8.5, as of writing
  • containerd v1.3.4
  • runc v1.0.0-rc90
  • Any of the CNIs, like: weavenet v2.6.5, or cilium v1.8

There are a series of steps to make it happen:

  • Building Docker Image
  • Spinning Up Containers as Kubernetes Nodes
  • Preparing All Kubernetes Nodes
  • Bootstrapping Control Plane
  • Joining Nodes to Cluster
  • Installing CNI Plugin
  • Accessing Kubernetes from the Laptop / Host
  • Cleaning UP

Let’s Build It Up, Step By Step

Building Docker Image

There are two major changes while building such a Docker image:

  1. It should support the init system like systemd so we can easily run multiple processes within the container;
  2. A series of tweaks, like proper mounts, must be performed to make sure Kubernetes can run smoothly on a Docker environment.

Luckily, kind team has done a great job while building such infrastructure so we can learn things from there.

So here is the simplified Dockerfile:

FROM ubuntu:20.04# refer to https://github.com/brightzheng100/kubernetes-the-hard-way-on-docker/tree/master/images/k8s-ready/files/usr/
COPY files/usr /usr
RUN echo "Ensuring scripts are executable ..." \
&& chmod +x /usr/local/bin/clean-install /usr/local/bin/entrypoint \
&& echo "Installing Packages ..." \
&& DEBIAN_FRONTEND=noninteractive clean-install \
systemd \
conntrack iptables iproute2 ethtool socat util-linux mount ebtables udev kmod \
libseccomp2 pigz \
bash ca-certificates curl \
nfs-common \
\
vim-tiny \
&& find /lib/systemd/system/sysinit.target.wants/ -name "systemd-tmpfiles-setup.service" -delete \
&& rm -f /lib/systemd/system/multi-user.target.wants/* \
&& rm -f /etc/systemd/system/*.wants/* \
&& rm -f /lib/systemd/system/local-fs.target.wants/* \
&& rm -f /lib/systemd/system/sockets.target.wants/*udev* \
&& rm -f /lib/systemd/system/sockets.target.wants/*initctl* \
&& rm -f /lib/systemd/system/basic.target.wants/* \
&& echo "ReadKMsg=no" >> /etc/systemd/journald.conf \
&& ln -s "$(which systemd)" /sbin/init \
&& echo "Adjusting systemd-tmpfiles timer" \
&& sed -i /usr/lib/systemd/system/systemd-tmpfiles-clean.timer -e 's#OnBootSec=.*#OnBootSec=1min#' \
&& echo "Modifying /etc/nsswitch.conf to prefer hosts" \
&& sed -i /etc/nsswitch.conf -re 's#^(hosts:\s*).*#\1dns files#' \
&& mkdir /kind
ENV container dockerSTOPSIGNAL SIGRTMIN+3ENTRYPOINT [ "/usr/local/bin/entrypoint", "/sbin/init" ]

To practice by yourself, I’d recommend you to check out my repo first and then build it:

# Clone git repo
git clone https://github.com/brightzheng100/kubernetes-the-hard-way-on-docker.git
cd kubernetes-the-hard-way-on-docker
# Change to your user if you want
image_user='quay.io/brightzheng100'
# The image for building and running Kubernetes
( cd images/k8s-ready && docker build -t ${image_user}/k8s-ready:ubuntu.20.04 . )
# We also prepare a simple image for runing HAProxy, as the LB
( cd images/haproxy && docker build -t ${image_user}/k8s-haproxy:2.1.7-alpine . )

Spinning Up Containers as Kubernetes Nodes

A custom Docker network will be used for node communication:

# Let's create a custom Docker network
docker network create lab
# Spin up the LB
docker run \
--name "k8s-lb0" \
--hostname "lb0" \
--network lab \
--detach \
--restart=on-failure:1 \
--tty \
--publish=6443/TCP \
${image_user}/k8s-haproxy:2.1.7-alpine
# Spin up the Kubernetes nodes
for node in "master0" "master1" "master2" "worker0" "worker1"; do
docker run \
--name "k8s-${node}" \
--hostname "${node}" \
--network lab \
--privileged \
--security-opt seccomp=unconfined \
--security-opt apparmor=unconfined \
--detach \
--restart=on-failure:1 \
--tty \
--tmpfs /tmp \
--tmpfs /run \
--tmpfs /run/lock \
--volume /var \
--volume /lib/modules:/lib/modules:ro \
--volume /sys/fs/cgroup:/sys/fs/cgroup:ro \
${image_user}/k8s-ready:ubuntu.20.04
done
# Review the containers we spun up
docker ps --format '--> {{.Image}} -- {{.Names}}'

OUTPUT: we should see all required “nodes” available

-->  quay.io/brightzheng100/k8s-haproxy:2.1.7-alpine -- k8s-lb0
--> quay.io/brightzheng100/k8s-ready:ubuntu.20.04 -- k8s-worker1
--> quay.io/brightzheng100/k8s-ready:ubuntu.20.04 -- k8s-worker0
--> quay.io/brightzheng100/k8s-ready:ubuntu.20.04 -- k8s-master2
--> quay.io/brightzheng100/k8s-ready:ubuntu.20.04 -- k8s-master1
--> quay.io/brightzheng100/k8s-ready:ubuntu.20.04 -- k8s-master0

Preparing All Kubernetes Nodes

All Kubernetes nodes are required to do some tuning and set up some required tools.

To log into containers, you may try the native Docker way: docker exec -it <node_name> bash, e.g. docker exec -it k8s-master0 bash.

After logging into all the Kubernetes nodes (expect load balancer nodes as they’re not), it’s highly recommended to enable iTerm2's Toggle Broadcasting Input feature, if you're using iTerm2 like me, so that we can execute commands in one console and they will automatically broadcast to all others parallelly -- all the commands to be executed are the same in this section.

But it’s okay to execute the commands into following nodes one by one, if you don’t mind the repetitive work.

  • k8s-master0
  • k8s-master1
  • k8s-master2
  • k8s-worker0
  • k8s-worker1

All the following steps will be executed within the node containers.

1. Tuning sysctl in all nodes

# Letting iptables see bridged traffic
cat <<EOF | tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
# Turning on source address verification
cat <<EOF | tee /etc/sysctl.d/10-network-security.conf
# Turn on Source Address Verification in all interfaces to
# prevent some spoofing attacks.
net.ipv4.conf.default.rp_filter=1
net.ipv4.conf.all.rp_filter=1
EOF
# Apply
sysctl --system

2. Installing container runtime

We’re going to use contaierd & runc here. Other container runtimes may work as well.

# Define desired components version
containerd_version=1.3.4
runc_version=1.0.0-rc90
# Install containerd
curl -sSLO https://github.com/containerd/containerd/releases/download/v${containerd_version}/containerd-${containerd_version}.linux-amd64.tar.gz
tar -C /usr/local -xvf containerd-${containerd_version}.linux-amd64.tar.gz
chmod +x /usr/local/bin/*
rm -f /usr/local/bin/containerd-stress /usr/local/bin/containerd-shim-runc-v1 containerd-1.3.4.linux-amd64.tar.gz
# Install runc
curl -sSLO https://github.com/opencontainers/runc/releases/download/v1.0.0-rc90/runc.amd64
chmod +x runc.amd64
mv runc.amd64 /usr/local/bin/runc

3. Configuring container runtime

# setup systemd service for contaierd
cat > /etc/systemd/system/containerd.service <<EOF
# derived containerd systemd service file from the official:
# https://github.com/containerd/containerd/blob/master/containerd.service
[Unit]
Description=containerd container runtime
Documentation=https://containerd.io
After=network.target
# disable rate limiting
StartLimitIntervalSec=0
[Service]
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/local/bin/containerd
Restart=always
RestartSec=1
Delegate=yes
KillMode=process
Restart=always
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNPROC=infinity
LimitCORE=infinity
LimitNOFILE=1048576
# Comment TasksMax if your systemd version does not supports it.
# Only systemd 226 and above support this version.
TasksMax=infinity
[Install]
WantedBy=multi-user.target
EOF
# setup containerd config
mkdir -p /etc/containerd
cat > /etc/containerd/config.toml <<EOF
# ref: https://github.com/containerd/cri/blob/master/docs/config.md
version = 2
[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "runc"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.test-handler]
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri"]
sandbox_image = "k8s.gcr.io/pause:3.2"
EOF
systemctl daemon-reload
systemctl start containerd
systemctl enable containerd

4. Installing Kubernetes Components

apt-get update && apt-get install -y apt-transport-https gnupg
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
cat <<EOF | tee /etc/apt/sources.list.d/kubernetes.list
deb https://apt.kubernetes.io/ kubernetes-xenial main
EOF
apt-get update
apt-get install -y kubelet kubeadm kubectl
apt-mark hold kubelet kubeadm kubectl
# Configure cgroup driver used by kubelet on control-plane node
mkdir -p /var/lib/kubelet
cat > /var/lib/kubelet/config.yaml <<EOF
# Ref: https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/kubelet/config/v1beta1/types.go
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd # cgroupfs | systemd
failSwapOn: false
EOF
systemctl daemon-reload
systemctl restart kubelet

If you’re in iTerm2's broadcast mode, quit this mode now as we're done executing commands in all nodes.

Bootstrapping Control Plane

We can bootstrap Kubernetes in any of the control plane nodes.

Let’s pick one control plane node, say k8s-master0, to bootstrap.

# In `k8s-master0`
# Define a kubeadm config file
mkdir -p /etc/kubernetes/kubeadm
cat > /etc/kubernetes/kubeadm/kubeadm-config.yaml <<EOF
# Ref: https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta2
apiVersion: kubeadm.k8s.io/v1beta1
kind: ClusterConfiguration
kubernetesVersion: stable
controlPlaneEndpoint: "k8s-lb0:6443" # pointing to lb
networking:
podSubnet: "10.32.0.0/12" # the CIDR the CNI may prefer, here is weavenet
apiServer:
certSANs: # extra SANs as we eventually will access the cluster from laptop
- "localhost"
- "127.0.0.1"
---
# Ref: https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/kubelet/config/v1beta1/types.go
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: cgroupfs # cgroupfs | systemd
failSwapOn: false
EOF
# Bootstrap kubernetes
kubeadm init \
--config=/etc/kubernetes/kubeadm/kubeadm-config.yaml \
--ignore-preflight-errors=all \
--upload-certs \
--v=6

SAMPLE OUTPUT:

...
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user: mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of the control-plane node running the following command on each as root: kubeadm join k8s-lb0:6443 --token 0wml54.fdhlnaqpy9wt03m3 \
--discovery-token-ca-cert-hash sha256:4366a9dcfb81955e6b18b04afc0bf581671a26cfbddb2bc322a0946c35f6119f \
--control-plane --certificate-key 60ca55a7ffc6ce962927509122a9580a032b2252a8b67e1e24e39767e522a367
Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
"kubeadm init phase upload-certs --upload-certs" to reload certs afterward.
Then you can join any number of worker nodes by running the following on each as root:kubeadm join k8s-lb0:6443 --token 0wml54.fdhlnaqpy9wt03m3 \
--discovery-token-ca-cert-hash sha256:4366a9dcfb81955e6b18b04afc0bf581671a26cfbddb2bc322a0946c35f6119f

Joining Nodes to Cluster

1. Joining control plane nodes

Run following command in both k8s-master1 and k8s-master2:

# This command is copied from the output of `kubeadm init`, with --ignore-preflight-errors=all
kubeadm join k8s-lb0:6443 --token 0wml54.fdhlnaqpy9wt03m3 \
--discovery-token-ca-cert-hash sha256:4366a9dcfb81955e6b18b04afc0bf581671a26cfbddb2bc322a0946c35f6119f \
--control-plane --certificate-key 60ca55a7ffc6ce962927509122a9580a032b2252a8b67e1e24e39767e522a367 \
--ignore-preflight-errors=all

Let’s check the nodes in any of the control plane nodes:

export KUBECONFIG=/etc/kubernetes/admin.conf
kubectl get nodes

OUTPUT:

NAME      STATUS     ROLES    AGE     VERSION
master0 NotReady master 8m54s v1.18.5
master1 NotReady master 4m17s v1.18.5
master2 NotReady master 74s v1.18.5

Note: Why they’re NotReady? It's because they're not yet installed with the CNI plugin! We'll be there soon.

2. Joining worker nodes

The kubeadm init generates another command for joining worker nodes too.

Let’s run following command in both k8s-worker0 and k8s-worker1 nodes:

# This command is copied from the output of `kubeadm init`, with --ignore-preflight-errors=all
kubeadm join k8s-lb0:6443 --token 0wml54.fdhlnaqpy9wt03m3 \
--discovery-token-ca-cert-hash sha256:4366a9dcfb81955e6b18b04afc0bf581671a26cfbddb2bc322a0946c35f6119f \
--ignore-preflight-errors=all

If you check the nodes again, you would see more nodes:

export KUBECONFIG=/etc/kubernetes/admin.conf
kubectl get nodes

OUTPUT:

NAME      STATUS     ROLES    AGE     VERSION
master0 NotReady master 14m v1.18.5
master1 NotReady master 9m55s v1.18.5
master2 NotReady master 6m52s v1.18.5
worker0 NotReady <none> 7s v1.18.5
worker1 NotReady <none> 2m v1.18.5

Installing CNI Plugin

There are many Container Networking Interface (CNI) compliant plugins within the Kubernetes community.

Any of the compliant CNI plugins should work for basic use cases and it’s your job to figure out what is right for you.

We can log into any of the control plane nodes to perform this task. For example:

docker exec -it k8s-master0 bash# In k8s-master0
export KUBECONFIG=/etc/kubernetes/admin.conf
# If you prefer weave net
# See potential issues if you `kubeadm reset -f` then `kubeadm init` many rounds
# https://github.com/weaveworks/weave/issues/3634#issuecomment-652837244
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
# Or cilium, choice is yours
kubectl create -f https://raw.githubusercontent.com/cilium/cilium/v1.8/install/kubernetes/quick-install.yaml

Note: please install only ONE CNI for this case to avoid potential issues.

Wait for a while, the nodes should be in Ready state and the pods should be in Running state:

# Check the nodes
kubectl get nodes

OUTPUT:

NAME          STATUS   ROLES    AGE     VERSION
k8s-master0 Ready master 12m v1.18.5
k8s-master1 Ready master 8m24s v1.18.5
k8s-master2 Ready master 7m8s v1.18.5
k8s-worker0 Ready <none> 4m39s v1.18.5
k8s-worker1 Ready <none> 4m39s v1.18.5
# Check the pods
kubectl get pod -n kube-system

OUTPUT:

NAME                                  READY   STATUS    RESTARTS   AGE
coredns-66bff467f8-2hw7h 1/1 Running 0 12m
coredns-66bff467f8-hjdv5 1/1 Running 0 12m
etcd-k8s-master0 1/1 Running 0 12m
etcd-k8s-master1 1/1 Running 0 7m22s
etcd-k8s-master2 1/1 Running 0 5m45s
kube-apiserver-k8s-master0 1/1 Running 0 12m
kube-apiserver-k8s-master1 1/1 Running 1 7m13s
kube-apiserver-k8s-master2 1/1 Running 3 5m55s
kube-controller-manager-k8s-master0 1/1 Running 1 12m
kube-controller-manager-k8s-master1 1/1 Running 0 7m44s
kube-controller-manager-k8s-master2 1/1 Running 0 6m18s
kube-proxy-5t6s5 1/1 Running 0 7m23s
kube-proxy-6m4b5 1/1 Running 0 4m54s
kube-proxy-786sh 1/1 Running 0 4m54s
kube-proxy-lgrfj 1/1 Running 0 12m
kube-proxy-vzt7d 1/1 Running 0 8m39s
kube-scheduler-k8s-master0 1/1 Running 1 12m
kube-scheduler-k8s-master1 1/1 Running 0 7m31s
kube-scheduler-k8s-master2 1/1 Running 0 6m26s
weave-net-2sp7b 2/2 Running 1 92s
weave-net-5q9nx 2/2 Running 1 92s
weave-net-mqvmz 2/2 Running 1 92s
weave-net-tqf4z 2/2 Running 0 92s
weave-net-w6rcn 2/2 Running 0 92s

Accessing Kubernetes from the Laptop / Host

There is already one kubeconfig file generated in all nodes, as /etc/kubernetes/admin.conf.

We can copy it back to our laptop, the host, so we can access it in a much convenient way.

Within our laptop, or the host:

# Copy it out, change the config name if you want
docker cp k8s-master0:/etc/kubernetes/admin.conf ~/.kube/config-lab
# We need to update the server as it mentions `server: https://k8s-lb0:6443`
# so we need to get the port mapping, e.g. 6443/tcp -> 0.0.0.0:32775
port="$( docker port k8s-lb0 6443 | cut -d":" -f2 )"
# In Mac, do this
sed -i '' "s|server: https://k8s-lb0:6443|server: https://localhost:${port}|g" ~/.kube/config-lab
# In Linux
sed -i "s|server: https://k8s-lb0:6443|server: https://localhost:${port}|g" ~/.kube/config-lab
export KUBECONFIG=~/.kube/config-lab
kubectl get nodes
kubectl get pods -n kube-system

You should see exactly the same result as what you did in any of the Kubernetes nodes.

Cleaning Up

To clean up the env:

# delete the containers
for node in "k8s-lb0" "k8s-master0" "k8s-master1" "k8s-master2" "k8s-worker0" "k8s-worker1"; do
docker rm -f ${node}
done
# delete the custom network
docker network rm lab

Congrats! You’ve done the Kubernetes the hard way, with kubeadm, on the Docker, within exactly your laptop!

Enjoy!

--

--