k8sapi / explainer 02
All explainers
node bootstrap flow

GET /bootstrap/control-plane/:vm_id

This endpoint turns durable cluster truth into local executable state for one exact VM. From first principles, the machine image cannot know its future cluster role, current peers, or final addresses. So the VM proves identity at runtime, and the API renders the precise files that machine needs to become a control-plane node.

What it returns

PKI, kubeconfigs, host config, systemd units, static pod manifests, addon assets, and helper scripts for one control-plane VM.

Why on demand

Because peer topology, node addresses, and runtime client identities are facts discovered after provisioning.

What it protects

The bundle is only returned after the VM proves origin, principal identity, org alignment, and exact-machine identity.

Section 01

Why bootstrap exists as a separate endpoint

The image cannot know enough at build time

A base image can contain binaries and generic scripts, but it cannot know which cluster it will join, which VM ID it will have, what its final IPs are, or which peers are in the control plane. Those are runtime facts.

The API should render truth, not remote-control the host

The API has the cluster state in Postgres, but it does not SSH into the VM or mutate host state directly. It returns a pure bundle, and the VM executes that bundle locally.

Durable cluster truth

cluster row, node rows, stored serving certs

rendered into

Executable host truth

files, configs, units, manifests, scripts

Executable host truth

specific to one exact VM and the current peer topology

consumed by

VM-local bootstrap

write files, install prerequisites, start etcd and kubelet

Section 02

How trust and identity are established

Proof 1: VM session origin

The handler reads X-exc-imds-token, hashes it, and looks it up in imds_tokens. That ties the request to an active VM-side IMDS session known to platform state.

Proof 2: principal bearer token

The bearer token is hashed and resolved through the shared token system. It must be principal-scoped, not just any bearer string.

Proof 3: same-org and same-VM match

The bearer token org must equal the IMDS token org, and the path VM ID must equal the IMDS-derived VM ID. That narrows bundle access to one exact machine.

Why it is layered this way

Bootstrap carries sensitive material. The code therefore asks four questions, not one: did this request come from a real VM session, does it also hold a principal identity, does that identity belong to the same org, and is it asking for its own bundle rather than another node’s?

Section 03

The execution sequence, step by step

01

Parse the requested VM ID

The path parameter establishes the claimed identity. It must be a positive integer before anything more expensive happens.

02

Run requireBootstrapVM

The helper validates the IMDS token, validates the principal bearer token, and checks org alignment. Any mismatch fails immediately with unauthorized or the relevant token error.

IMDS token principal token same-org check
03

Refuse cross-node bundle access

The handler compares the requested VM ID with imdsToken.VMID. Even a valid caller cannot ask for another VM’s bundle.

04

Load the cluster and node rows

The handler loads the current VM’s apiserver row, the current VM’s etcd row, the cluster row, and the full list of API server rows for the cluster. It then resolves current VM identities for all control-plane peers from the database.

05

Rehydrate the CA and mint fresh client identities

The stored CA PEM and CA private key are parsed back into x509 objects. From that signer, the handler mints fresh runtime client certs for admin, apiserver-etcd-client, apiserver-kubelet-client, kube-proxy, and kubelet.

06

Build the topology-aware bundle

BuildControlPlaneNodeBundle computes the service IP, DNS service IP, API server endpoint, etcd initial cluster string, suggested DNS records, and whether this is the initial control-plane node. Then it renders the file map.

07

Return JSON, not side effects

The endpoint returns metadata and the rendered files. The VM-side bootstrap path writes those files to disk and starts local services. That keeps the API responsible for truth rendering, not for in-host mutation.

Section 04

Why the bundle contains these exact file classes

PKI files

Both cluster-wide signers and node-scoped identities are needed because trust in this system is certificate-driven all the way down.

Kubeconfigs

Each local component needs a ready-made client identity so the node can come up without an extra enrollment protocol.

cluster.env

Core bootstrap facts such as node IPs, API endpoint, service CIDR, and initial-control-plane status need one machine-readable source.

Systemd units

etcd and kubelet belong to the host init system because they must survive reboot and start in a predictable order.

Static pod manifests

The kubelet manages the apiserver, controller-manager, and scheduler from the local filesystem before the cluster is fully normal.

Addon assets

Cilium, kube-proxy, and CoreDNS are included because networking and service discovery are required for a usable cluster, not optional polish.

Why some certs are stored and others are minted on demand

The per-node etcd and apiserver serving certs are part of the durable cluster model created during POST /clusters. The admin and component client certs are execution-time identities that the bootstrap flow can mint fresh from the stored CA.

Section 05

How the template system fits together

The templates are split across two phases on purpose. One launch-time template is injected into the VM create request so a brand-new machine can discover itself. The rest are rendered later, on demand, from current cluster state and returned inside the bootstrap bundle. That separation keeps VM creation small and generic while still letting bootstrap be topology-aware.

Phase 1: launch shim

bootstrap-userdata.sh is rendered during POST /clusters and passed to computeapi as VM userdata. Its only job is to fetch IMDS identity, fetch the bootstrap bearer token, download the bundle, and hand off to local execution.

Phase 2: bundle rendering

BuildControlPlaneNodeBundle plus rendered_files.go turn durable cluster rows, current peer topology, and freshly minted client certs into concrete files for one VM.

Phase 3: local execution

The returned scripts and configs are not documentation artifacts. They are the runnable host contract that brings up containerd, etcd, kubelet, static pods, and first-node-only addons.

The grand scheme

internal/bootstrap/templates/renderer.go embeds the template set, internal/bootstrap/rendered_files.go chooses the data for each template, internal/bootstrap/bundle.go maps rendered output onto host paths, and the VM-side scripts consume those files to make the node real. The templates are therefore the seam between durable API truth and executable host truth.

Template Rendered to / used by Role in the system
bootstrap-userdata.sh VM userdata during POST /clusters Bootstraps the bootstrap: gets IMDS identity and token material, downloads the node bundle, writes files, then invokes local scripts.
cluster.env /etc/exk8s/config/cluster.env Single machine-readable source for cluster name, service and pod CIDRs, control-plane endpoint, node addresses, and first-node status.
kubeconfig Admin, kubelet, kube-proxy, controller-manager, and scheduler kubeconfigs One generic template that gives each local component a ready-to-use client identity without a second enrollment protocol.
kubelet-config.yaml /var/lib/kubelet/config.yaml Declares kubelet runtime behavior such as cluster DNS, static pod path, containerd socket, and node bind address.
kubelet.service /etc/systemd/system/kubelet.service Puts kubelet under host init control so it survives reboot and sees the rendered static pod manifests.
etcd.service /etc/systemd/system/etcd.service Defines the stacked-etcd member with the current node name, bind address, cluster peer set, and PKI paths.
kube-apiserver.yaml /etc/kubernetes/manifests/kube-apiserver.yaml Static pod manifest for the API server, wired to etcd, service-account signing, kubelet client auth, and the cluster service CIDR.
kube-controller-manager.yaml /etc/kubernetes/manifests/kube-controller-manager.yaml Static pod manifest for the controller-manager, pointed at cluster name, pod CIDR, shared CA, and service-account signing key.
kube-scheduler.yaml /etc/kubernetes/manifests/kube-scheduler.yaml Static pod manifest for the scheduler so placement decisions are local to the control-plane node from first boot.
single-node-bridge-cni.conflist /etc/cni/net.d/10-exk8s.conflist Provides a simple host-local bridge CNI baseline so pod networking has a local substrate before higher-level cluster addons settle in.
cilium-values.yaml /etc/exk8s/config/cilium-values.yaml Feeds cluster-specific values into the first-node Cilium Helm install, especially the chosen pod CIDR and Kubernetes service host.
kube-proxy.yaml /etc/exk8s/manifests/kube-proxy.yaml DaemonSet manifest applied by the addon installer so service VIP routing becomes available across nodes.
coredns.yaml /etc/exk8s/manifests/coredns.yaml Cluster DNS deployment, service, and RBAC, parameterized with the current cluster domain and DNS service IP.
exk8s-install-control-plane-prereqs.sh /usr/local/bin/exk8s-install-control-plane-prereqs.sh Host preparation layer: installs packages, configures containerd, downloads Kubernetes binaries, downloads etcd, and creates base directories.
exk8s-bootstrap-control-plane.sh /usr/local/bin/exk8s-bootstrap-control-plane.sh Execution coordinator: loads cluster.env, verifies binaries exist, reloads systemd, starts etcd and kubelet, then gates addon installation on the initial node.
exk8s-install-cluster-addons.sh /usr/local/bin/exk8s-install-cluster-addons.sh First-node-only cluster-shared addon installer that waits for API readiness, installs Cilium via Helm, then applies kube-proxy and CoreDNS.

Rendered example bundle

The blocks below show what these templates look like after rendering for one concrete sample bundle: org 7, cluster 42, cluster name exk8s-7-2a, shared endpoint exk8s-7-2a.k8s.excloud.co.in, current node cp-1.cluster-42.internal at 10.0.0.11, peer node cp-2.cluster-42.internal at 10.0.0.12. Long PEM and base64 blobs are trimmed where the shape matters more than the bytes.

bootstrap-userdata.sh rendered excerpt
#!/usr/bin/env bash
set -euo pipefail

IMDS_BASE_URL="${EXC_IMDS_BASE_URL:-http://imdsapi.excloud.in}"
BOOTSTRAP_BASE_URL="https://k8sapi.excloud.in"

NODE_ID="$(curl -fsS -H "X-exc-imds-token: ${IMDS_TOKEN}" "${IMDS_BASE_URL}/latest/identity/node-identity" | jq -r '.node_id')"
BOOTSTRAP_ACCESS_TOKEN="$(curl -fsS -H "X-exc-imds-token: ${IMDS_TOKEN}" "${IMDS_BASE_URL}/latest/identity/access-token" | jq -r '.access_token')"

curl -fsS \
  -H "X-exc-imds-token: ${IMDS_TOKEN}" \
  -H "Authorization: Bearer ${BOOTSTRAP_ACCESS_TOKEN}" \
  "${BOOTSTRAP_BASE_URL}/bootstrap/control-plane/${NODE_ID}" > "$TMP_JSON"

jq -r '.files | to_entries[] | @base64' "$TMP_JSON" | while IFS= read -r entry; do
  key="$(printf '%s' "$entry" | base64 --decode | jq -r '.key')"
  value="$(printf '%s' "$entry" | base64 --decode | jq -r '.value')"
  mkdir -p "$(dirname "$key")"
  printf '%s' "$value" > "$key"
done

/usr/local/bin/exk8s-bootstrap-control-plane.sh
cluster.env rendered output
CLUSTER_NAME=exk8s-7-2a
KUBERNETES_VERSION=v1.35.3
CLUSTER_DOMAIN=cluster.local
SERVICE_CIDR=10.96.0.0/12
SERVICE_CLUSTER_IP=10.96.0.1
DNS_SERVICE_IP=10.96.0.10
POD_CIDR=172.16.0.0/12
APISERVER_ENDPOINT=https://10.0.0.11:6443
CONTROL_PLANE_DNS_NAME=exk8s-7-2a.k8s.excloud.co.in
NODE_NAME=cp-1.cluster-42.internal
NODE_IPV4=10.0.0.11
NODE_IPV6=
ETCD_INITIAL_CLUSTER=cp-1.cluster-42.internal=https://10.0.0.11:2380,cp-2.cluster-42.internal=https://10.0.0.12:2380
IS_INITIAL_CONTROL_PLANE=true
kubeconfig rendered as admin.kubeconfig
apiVersion: v1
kind: Config
clusters:
- name: exk8s-7-2a
  cluster:
    certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tLi4u
    server: https://10.0.0.11:6443
users:
- name: cluster-admin
  user:
    client-certificate-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tLi4u
    client-key-data: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLS4uLg==
contexts:
- name: cluster-admin
  context:
    cluster: exk8s-7-2a
    user: cluster-admin
current-context: cluster-admin
kubelet-config.yaml rendered output
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
authentication:
  anonymous:
    enabled: false
  x509:
    clientCAFile: /etc/exk8s/pki/ca.pem
clusterDNS:
- 10.96.0.10
clusterDomain: cluster.local
containerRuntimeEndpoint: unix:///run/containerd/containerd.sock
staticPodPath: /etc/kubernetes/manifests
tlsCertFile: /etc/exk8s/pki/kubelet-client.pem
tlsPrivateKeyFile: /etc/exk8s/pki/kubelet-client-key.pem
address: 10.0.0.11
kubelet.service rendered output
[Unit]
Description=Kubernetes Kubelet
After=network-online.target containerd.service

[Service]
EnvironmentFile=-/etc/exk8s/config/cluster.env
ExecStart=/usr/local/bin/kubelet \
  --config=/var/lib/kubelet/config.yaml \
  --kubeconfig=/etc/exk8s/kubeconfig/kubelet.kubeconfig \
  --hostname-override=cp-1.cluster-42.internal \
  --node-ip=10.0.0.11

[Install]
WantedBy=multi-user.target
etcd.service rendered output
[Unit]
Description=etcd
After=network-online.target

[Service]
Type=notify
ExecStart=/usr/local/bin/etcd \
  --name=cp-1.cluster-42.internal \
  --listen-client-urls=https://10.0.0.11:2379,https://127.0.0.1:2379 \
  --advertise-client-urls=https://10.0.0.11:2379 \
  --listen-peer-urls=https://10.0.0.11:2380 \
  --initial-advertise-peer-urls=https://10.0.0.11:2380 \
  --initial-cluster=cp-1.cluster-42.internal=https://10.0.0.11:2380,cp-2.cluster-42.internal=https://10.0.0.12:2380 \
  --trusted-ca-file=/etc/exk8s/pki/ca.pem \
  --cert-file=/etc/exk8s/pki/etcd-peer.pem \
  --key-file=/etc/exk8s/pki/etcd-peer-key.pem
kube-apiserver.yaml rendered excerpt
apiVersion: v1
kind: Pod
metadata:
  name: kube-apiserver
  namespace: kube-system
spec:
  hostNetwork: true
  containers:
  - name: kube-apiserver
    image: registry.k8s.io/kube-apiserver:v1.35.3
    command:
    - kube-apiserver
    - --advertise-address=10.0.0.11
    - --etcd-servers=https://10.0.0.11:2379,https://10.0.0.12:2379
    - --kubelet-preferred-address-types=InternalIP,Hostname
    - --service-account-issuer=https://exk8s-7-2a.k8s.excloud.co.in:6443
    - --service-cluster-ip-range=10.96.0.0/12
    - --tls-cert-file=/etc/exk8s/pki/apiserver.pem
    - --tls-private-key-file=/etc/exk8s/pki/apiserver-key.pem
kube-controller-manager.yaml rendered excerpt
apiVersion: v1
kind: Pod
metadata:
  name: kube-controller-manager
  namespace: kube-system
spec:
  hostNetwork: true
  containers:
  - name: kube-controller-manager
    image: registry.k8s.io/kube-controller-manager:v1.35.3
    command:
    - kube-controller-manager
    - --cluster-name=exk8s-7-2a
    - --cluster-cidr=172.16.0.0/12
    - --kubeconfig=/etc/exk8s/kubeconfig/controller-manager.kubeconfig
    - --root-ca-file=/etc/exk8s/pki/ca.pem
    - --service-account-private-key-file=/etc/exk8s/pki/service-account-key.pem
kube-scheduler.yaml rendered excerpt
apiVersion: v1
kind: Pod
metadata:
  name: kube-scheduler
  namespace: kube-system
spec:
  hostNetwork: true
  containers:
  - name: kube-scheduler
    image: registry.k8s.io/kube-scheduler:v1.35.3
    command:
    - kube-scheduler
    - --authentication-kubeconfig=/etc/exk8s/kubeconfig/scheduler.kubeconfig
    - --authorization-kubeconfig=/etc/exk8s/kubeconfig/scheduler.kubeconfig
    - --kubeconfig=/etc/exk8s/kubeconfig/scheduler.kubeconfig
    - --leader-elect=true
single-node-bridge-cni.conflist rendered output
{
  "cniVersion": "0.4.0",
  "name": "exk8s",
  "plugins": [
    {
      "type": "bridge",
      "bridge": "cni0",
      "isDefaultGateway": true,
      "ipMasq": true,
      "hairpinMode": true,
      "ipam": {
        "type": "host-local",
        "ranges": [[{ "subnet": "172.16.0.0/12" }]],
        "routes": [{ "dst": "0.0.0.0/0" }]
      }
    }
  ]
}
cilium-values.yaml rendered output
k8sServiceHost: 10.0.0.11
k8sServicePort: 6443
ipv4:
  enabled: true
ipv6:
  enabled: false
kubeProxyReplacement: false
routingMode: tunnel
tunnelProtocol: vxlan
ipam:
  mode: cluster-pool
  operator:
    clusterPoolIPv4PodCIDRList:
      - 172.16.0.0/12
kube-proxy.yaml rendered excerpt
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: kube-proxy
  namespace: kube-system
spec:
  template:
    spec:
      hostNetwork: true
      containers:
      - name: kube-proxy
        image: registry.k8s.io/kube-proxy:v1.35.3
        command:
        - kube-proxy
        - --kubeconfig=/etc/exk8s/kubeconfig/kube-proxy.kubeconfig
        - --cluster-cidr=172.16.0.0/12
        - --hostname-override=$(NODE_NAME)
        - --proxy-mode=iptables
coredns.yaml rendered excerpt
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
data:
  Corefile: |-
    .:53 {
        kubernetes cluster.local in-addr.arpa ip6.arpa {
            pods insecure
            fallthrough in-addr.arpa ip6.arpa
        }
        forward . /etc/resolv.conf
    }
---
apiVersion: v1
kind: Service
metadata:
  name: kube-dns
spec:
  clusterIP: 10.96.0.10
exk8s-install-control-plane-prereqs.sh rendered excerpt
#!/usr/bin/env bash
set -euo pipefail

apt-get update -y
apt-get install -y ca-certificates conntrack containerd containernetworking-plugins curl ebtables ethtool iproute2 iptables socat wget

KUBE_VERSION=v1.35.3
ETCD_VERSION=v3.6.10

download_binary "https://dl.k8s.io/${KUBE_VERSION}/bin/linux/${ARCH}/kube-apiserver" /usr/local/bin/kube-apiserver
download_binary "https://dl.k8s.io/${KUBE_VERSION}/bin/linux/${ARCH}/kube-controller-manager" /usr/local/bin/kube-controller-manager
download_binary "https://dl.k8s.io/${KUBE_VERSION}/bin/linux/${ARCH}/kube-scheduler" /usr/local/bin/kube-scheduler
download_binary "https://dl.k8s.io/${KUBE_VERSION}/bin/linux/${ARCH}/kubelet" /usr/local/bin/kubelet
download_binary "https://dl.k8s.io/${KUBE_VERSION}/bin/linux/${ARCH}/kubectl" /usr/local/bin/kubectl
exk8s-bootstrap-control-plane.sh rendered output
#!/usr/bin/env bash
set -euo pipefail

if [[ -f /etc/exk8s/config/cluster.env ]]; then
  set -a
  . /etc/exk8s/config/cluster.env
  set +a
fi

for bin in containerd etcd kube-apiserver kube-controller-manager kube-scheduler kubelet; do
  if ! command -v "$bin" >/dev/null 2>&1; then
    echo "missing required binary: $bin"
    exit 1
  fi
done

systemctl daemon-reload
systemctl enable containerd || true
systemctl restart containerd || true
systemctl enable etcd
systemctl enable kubelet
systemctl restart etcd
systemctl restart kubelet
exk8s-install-cluster-addons.sh rendered excerpt
#!/usr/bin/env bash
set -euo pipefail

if [[ "${IS_INITIAL_CONTROL_PLANE:-false}" != "true" ]]; then
  exit 0
fi

export KUBECONFIG=/etc/exk8s/kubeconfig/admin.kubeconfig

helm upgrade --install cilium oci://quay.io/cilium/charts/cilium \
  --version 1.19.2 \
  --namespace kube-system \
  --create-namespace \
  --values /etc/exk8s/config/cilium-values.yaml \
  --wait \
  --timeout 10m

kubectl apply -f /etc/exk8s/manifests/kube-proxy.yaml
kubectl apply -f /etc/exk8s/manifests/coredns.yaml
Section 06

What the VM does with the bundle, and why

Phase 1: fetch and write

The userdata shim fetches the bundle using the VM’s IMDS token and bootstrap access token, then writes the returned file map to the host filesystem.

Phase 2: install prerequisites

If the prerequisite installer exists, the VM runs it to ensure containerd, Kubernetes binaries, etcd, and helper packages are present.

Phase 3: bring up the local substrate

The bootstrap runner reloads systemd, enables and restarts containerd, enables etcd and kubelet, and starts them. The kubelet then sees the static pod manifests and launches the control-plane pods.

Phase 4: first-node-only addon install

Only the initial control-plane node waits for API readiness and installs Cilium, kube-proxy, and CoreDNS. Cluster-shared addon state should be applied once, not raced by every control-plane VM.

Section 07

Security properties and current limits

What this protects well

It stops unauthenticated bundle access, prevents one VM from asking for another VM’s bundle, and binds bootstrap to both VM session origin and principal identity.

What remains intentionally rough

The cluster CA private key and service-account private key still live in Postgres. That is a deliberate MVP tradeoff, not a finished long-term security posture.

Why VM ID still matters

VM ID is the stable join key across IMDS, VM inventory, and the kube tables. It is the simplest durable lookup key for this first bootstrap slice.

What the endpoint does not do

It does not reconcile failed host-side execution after the bundle is delivered. It renders truth; the VM executes that truth locally.

Section 08

First-principles recap

A node must be told who it is The machine proves identity; it does not invent cluster membership or trust roots for itself.
Durable truth and executable truth are different layers Postgres stores the durable cluster and node facts; the bundle turns those facts into runnable host state.
Per-node state and cluster-shared state are different responsibilities Every node gets its own local files, but shared addon installation is restricted to the initial control-plane node.
Bootstrap is specific on purpose The whole point of on-demand rendering is that the returned files match the current peer topology and this exact VM.

Key code: internal/handlers/bootstrapcontrolplane.go, internal/handlers/bootstrapauth.go, internal/bootstrap/bundle.go, internal/bootstrap/rendered_files.go, internal/bootstrap/userdata.go, internal/bootstrap/templates/renderer.go, internal/repository/imdstokens.go.