Up and Running With Talos OS

What is Talos OS?

Talos OS (hereafter reffered to as Talos) is a linux based operating system designed specifically for running containerized workloads like K8s and Docker. Talos has a couple of features that make administration a breeze, such as immutable upgrades, container based management, and automatic provisioning.

Prerequesites

There are a few prerequisites that are required when setting up Talos, but these also apply to K8s in general. Setting up the prerequisites goes beyond the scope of this article, but they are summarized below.

Working DHCP
Working DNS with hostnames configured correctly.
Working NTP
A hypervisor such as Proxmox, or Hyper-V. (Bare metal instructions would be similar)

In order to make life a little bit easier, throughout this tutorial we will be referring to the nodes by their hostnames. We have the following set up in our test environment.

Subnet: 192.168.0.0/24 Gateway: 192.168.0.1

192.168.0.2: controller.example.com 192.168.0.3: worker1.example.com 192.168.0.4: worker2.example.com 192.168.0.5: worker3.example.com

NOTE: It’s important to have working DNS here as the nodes will be connecting to each other via TLS, which means any mismatch here will cause problems with node communication. This is one of the disadvantages to using automatic TLS such as with Talos.

NOTE: It’s also important to set the nodes up with pretty beefy minimum specs. K8s chews through resources with redundant pods, so make sure machines have at least 8Gib of memory allocated and 2 vCPUs. You’ll also need to make sure that you’re running the host with the latest CPU instruction sets enabled: QEMU x86-64-v2 worked for this guide, but passthrough or ‘host’ mode would work as well.

NOTE: I’m also going to recommend setting IPs via static DHCP entries instead of hardcoding.

Terminology

Talos uses some terminology that might be a bit confusing, especially if you’re used to Docker swarm, or K8s.

Talos refers to an ‘Endpoint’ which simply means that it is a control node, which manages the workers and contains the etcd database. A ’node’ is just a worker where the pods are being run.

The endpoint server automatically proxies commands to the cluster nodes. For example, the command

talosctl -e 192.168.0.2 -n 192.168.0.3,192.168.0.4 logs etcd

Will use 192.168.0.2 to retreieve the etcd logs from 192.168.0.3 & 192.168.0.4. Therefore, we only need to interact with the endpoint after we are done set up.

Getting Set Up

Get the latest baremetal ISO

You will also need to download talosctl the command line program for interacting with the OS (You won’t be able to use kubectl until later). You can install talosctl with the following command:

curl -sL https://talos.dev/install | sh

Installing

Set up the 4 Talos VMs by inserting the ISO and booting the machine, there is no installer. Nodes will be bootstrapped once we’ve finished with the configuration below.

This command will generate the initial configuration for connecting to the cluster:

talosctl gen config talos-proxmox-cluster https://controller.example.com:6443 \
  --output-dir _out

Next we need to generate a patch file to set the hostname on each node, create the following files in the _out/nodename{1..4}.yaml directory substituting the hostname for the node you’re working on:

machine:
  network:
  	interfaces:
	  - interface: eth0
	    dhcp: true
    hostname: controller
	
cluster:
  network:
    dnsDomain: example.com

Next, you need to apply the patch using talosctl:

talosctl patch --mode=no-reboot \
  --insecure \
  machineconfig -n 192.168.0.2 \
  --patch @_old/controller.yaml

Applying this patch will set the hostname/domain of the machine, as well as ensure that the interface is configured to use dhcp.

NOTE: You need to prepend @ to the path of the patch. Keep this in mind for future patches. Also note the –mode=no-reboot and –insecure flags, these are needed since we haven’t bootstrapped the cluster yet.

With hostnames set, we can finalize the installation.

This command will list the disks currently available on the running system/cluster.

talosctl disks --insecure --nodes controller.example.com

Next, review the files \_out/controlplane.yaml and \_out/worker.yaml, search for the machine->install->disk section, and ensure that the disk listed above is correct, the configuration should look like this:

machine:
  install:
    disk: /dev/sda

We can configure the cluster using the command:

talosctl apply-config --insecure \
  --nodes controller.example.com \
  --file controlplane.yaml

We then need to configure our workers using the command

talosctl apply-config --insecure \
  --nodes worker1.example.com,worker2.example.com,worker3.example.com \
  --file _out/worker.yaml

The nodes will go through their installation process, overwriting the disk. After a quick reboot the cluster should come up and we can connect to it securely. If something went wrong with setting the hostnames, this is where it will happen.

export TALOSCONFIG="${PWD}/_out/talosconfig"
talosctl config endpoint controller.example.com
talosctl config node controller.example.com

NOTE: If you see the message no context is set make sure that your TALOSCONFIG environment variable is set correctly.

Next you need to bootstrap the cluster with:

talosctl bootstrap \
  --nodes controller.example.com \
  --endpoints controller.example.com

This will initialize the etcd database and prepare the cluster for use.

Now we should have a cluster up and running. We can use the following to generate a kubeconfig

talosctl kubeconfig .config/kubeconfig
export KUBECONFIG="./config/kubeconfig"

And we should be able to run basic commands such as kubectl get nodes in order to see node state.

That’s it! You should have a functional K8s cluster to play around with. Happy hacking!