[0004] Server Operating System Selection

Evaluate operating systems for homelab server infrastructure with focus on Kubernetes cluster setup and maintenance.

Context and Problem Statement

The homelab infrastructure requires a server operating system to run Kubernetes clusters for container workloads. The choice of operating system significantly impacts ease of cluster initialization, ongoing maintenance burden, security posture, and operational complexity.

The question is: Which operating system should be used for homelab Kubernetes servers?

This decision will affect:

Cluster Initialization: Complexity and time required to bootstrap Kubernetes
Maintenance Burden: Frequency and complexity of OS updates, Kubernetes upgrades, and patching
Security Posture: Attack surface, built-in security features, and hardening requirements
Resource Efficiency: RAM, CPU, and disk overhead
Operational Complexity: Day-to-day management, troubleshooting, and debugging
Learning Curve: Time required for team to become proficient

Decision Drivers

Ease of Kubernetes Setup: Minimize steps and complexity for cluster initialization
Maintenance Simplicity: Reduce ongoing operational burden for updates and upgrades
Security-First Design: Minimal attack surface and strong security defaults
Resource Efficiency: Low RAM/CPU/disk overhead for cost-effective homelab
Learning Curve: Reasonable adoption time for single-person homelab
Community Support: Strong documentation and active community
Immutability: Prefer declarative, version-controlled configuration (GitOps-friendly)
Purpose-Built: OS optimized specifically for Kubernetes vs general-purpose

Considered Options

Option 1: Ubuntu Server with k3s
Option 2: Fedora Server with kubeadm
Option 3: Talos Linux (purpose-built Kubernetes OS)
Option 4: Harvester HCI (hyperconverged platform)

Decision Outcome

Chosen option: “Option 3: Talos Linux”, because:

Minimal Attack Surface: No SSH, shell, or package manager eliminates entire classes of vulnerabilities, providing the strongest security posture
Built-in Kubernetes: No separate installation or configuration complexity - Kubernetes is included and optimized
Declarative Configuration: API-driven, immutable infrastructure aligns with GitOps principles and prevents configuration drift
Lowest Resource Overhead: ~768MB RAM vs 1-2GB+ for traditional distros, maximizing homelab hardware efficiency
Simplified Maintenance: Declarative upgrades (talosctl upgrade) for both OS and Kubernetes reduce operational burden
Security by Default: Immutable filesystem, no shell, KSPP compliance - secure without manual hardening

While the learning curve is steeper than traditional Linux distributions, the benefits of purpose-built Kubernetes infrastructure, minimal maintenance, and superior security outweigh the initial learning investment for a dedicated Kubernetes homelab.

Consequences

Good, because minimal attack surface (no SSH/shell) provides strongest security posture
Good, because declarative configuration enables GitOps workflows and prevents drift
Good, because lowest resource overhead (~768MB RAM) maximizes homelab efficiency
Good, because built-in Kubernetes eliminates installation complexity
Good, because immutable infrastructure prevents configuration drift
Good, because simplified upgrades (single command for OS + K8s) reduce maintenance burden
Good, because smallest disk footprint (~500MB) vs 10GB+ for traditional distros
Good, because secure by default (no manual hardening required)
Good, because purpose-built design optimized specifically for Kubernetes
Good, because API-driven management (talosctl) enables automation
Neutral, because steeper learning curve (paradigm shift from shell-based management)
Neutral, because smaller community than Ubuntu/Fedora (but active and helpful)
Bad, because limited to Kubernetes workloads only (not general-purpose)
Bad, because no shell access requires different troubleshooting approach
Bad, because newer platform (less mature than Ubuntu/Fedora)
Bad, because no escape hatch for manual intervention when needed

Confirmation

The implementation will be confirmed by:

Successfully bootstrapping a Talos cluster using talosctl
Deploying test workloads and validating functionality
Performing declarative OS and Kubernetes upgrades
Measuring actual resource usage (RAM < 1GB per node)
Validating security posture (no SSH/shell, immutable filesystem)
Testing GitOps workflow (machine configs in version control)

Pros and Cons of the Options

Option 1: Ubuntu Server with k3s

Host Kubernetes using Ubuntu Server 24.04 LTS with k3s lightweight Kubernetes distribution.

Architecture Overview

sequenceDiagram
    participant Admin
    participant Server as Ubuntu Server
    participant K3s as k3s Components
    
    Admin->>Server: Install Ubuntu 24.04 LTS
    Server->>Server: Configure network (static IP)
    Admin->>Server: Update system
    Admin->>Server: curl -sfL https://get.k3s.io | sh -
    Server->>K3s: Download k3s binary
    K3s->>Server: Configure containerd
    K3s->>Server: Start k3s service
    K3s->>Server: Initialize etcd (embedded)
    K3s->>Server: Start API server
    K3s->>Server: Deploy built-in CNI (Flannel)
    K3s-->>Admin: Control plane ready
    Admin->>Server: Retrieve node token
    Admin->>Server: Install k3s agent on workers
    K3s->>Server: Join workers to cluster
    K3s-->>Admin: Cluster ready (5-10 minutes)

Implementation Details

Installation:

# Single-command k3s install
curl -sfL https://get.k3s.io | sh -

# Get token for workers
sudo cat /var/lib/rancher/k3s/server/node-token

# Install on workers
curl -sfL https://get.k3s.io | K3S_URL=https://control-plane:6443 K3S_TOKEN=<token> sh -

Resource Requirements:

RAM: 1GB total (512MB OS + 512MB k3s)
CPU: 1-2 cores
Disk: 20GB (10GB OS + 10GB containers)

Maintenance:

# OS updates
sudo apt update && sudo apt upgrade

# k3s upgrade
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.32.0+k3s1 sh -

# Or automatic via system-upgrade-controller

Pros and Cons

Good, because most familiar Linux distribution (easy adoption)
Good, because 5-year LTS support (10 years with Ubuntu Pro)
Good, because k3s provides single-command setup
Good, because extensive documentation and community support
Good, because compatible with all Kubernetes tooling
Good, because automatic security updates available
Good, because general-purpose (can run non-K8s workloads)
Good, because low learning curve
Neutral, because moderate resource overhead (1GB RAM)
Bad, because general-purpose OS has larger attack surface
Bad, because requires manual OS updates and reboots
Bad, because managing OS + Kubernetes lifecycle separately
Bad, because imperative configuration (not GitOps-native)
Bad, because mutable filesystem (configuration drift possible)

Option 2: Fedora Server with kubeadm

Host Kubernetes using Fedora Server with kubeadm (official Kubernetes tool) and CRI-O container runtime.

Architecture Overview

sequenceDiagram
    participant Admin
    participant Server as Fedora Server
    participant K8s as Kubernetes Components
    
    Admin->>Server: Install Fedora 41
    Server->>Server: Configure network
    Admin->>Server: Update system (dnf update)
    Admin->>Server: Install CRI-O
    Server->>Server: Configure CRI-O runtime
    Admin->>Server: Install kubeadm/kubelet/kubectl
    Server->>Server: Disable swap, load kernel modules
    Server->>Server: Configure SELinux
    Admin->>K8s: kubeadm init --cri-socket=unix:///var/run/crio/crio.sock
    K8s->>Server: Generate certificates
    K8s->>Server: Start etcd
    K8s->>Server: Start API server
    K8s-->>Admin: Control plane ready
    Admin->>K8s: kubectl apply CNI
    K8s->>Server: Deploy CNI pods
    Admin->>K8s: kubeadm join (workers)
    K8s-->>Admin: Cluster ready (15-20 minutes)

Implementation Details

Installation:

# Install CRI-O
sudo dnf install -y cri-o
sudo systemctl enable --now crio

# Install kubeadm components
sudo dnf install -y kubelet kubeadm kubectl

# Initialize cluster
sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --cri-socket=unix:///var/run/crio/crio.sock

# Install CNI
kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.28.0/manifests/calico.yaml

Resource Requirements:

RAM: 2.2GB total (700MB OS + 1.5GB Kubernetes)
CPU: 2+ cores
Disk: 35GB (15GB OS + 20GB containers)

Maintenance:

# OS updates (every 13 months major upgrade)
sudo dnf update -y

# Kubernetes upgrade
sudo dnf update -y kubeadm
sudo kubeadm upgrade apply v1.32.0
sudo dnf update -y kubelet kubectl

Pros and Cons

Good, because SELinux enabled by default (stronger than AppArmor)
Good, because latest kernel and packages (bleeding edge)
Good, because native CRI-O support (OpenShift compatibility)
Good, because upstream for RHEL (enterprise patterns)
Good, because kubeadm provides full control over cluster
Neutral, because faster release cycle (latest features, but more upgrades)
Bad, because short support cycle (13 months per release)
Bad, because bleeding-edge can introduce instability
Bad, because complex kubeadm setup (many manual steps)
Bad, because higher resource overhead (2.2GB RAM)
Bad, because SELinux configuration for Kubernetes is complex
Bad, because frequent OS upgrades required (every 13 months)
Bad, because managing OS + Kubernetes separately
Bad, because imperative configuration (not GitOps-native)

Option 3: Talos Linux (purpose-built Kubernetes OS)

Use Talos Linux, an immutable, API-driven operating system designed specifically for Kubernetes with built-in cluster management.

Architecture Overview

sequenceDiagram
    participant Admin
    participant Server as Bare Metal Server
    participant Talos as Talos Linux
    participant K8s as Kubernetes Components
    
    Admin->>Server: Boot Talos ISO (PXE or USB)
    Server->>Talos: Start in maintenance mode
    Talos-->>Admin: API endpoint ready
    Admin->>Admin: Generate configs (talosctl gen config)
    Admin->>Talos: talosctl apply-config (controlplane.yaml)
    Talos->>Server: Install Talos to disk
    Server->>Server: Reboot from disk
    Talos->>K8s: Start kubelet
    Talos->>K8s: Start etcd
    Talos->>K8s: Start API server
    Admin->>Talos: talosctl bootstrap
    Talos->>K8s: Initialize cluster
    K8s->>Talos: Start controller-manager
    K8s-->>Admin: Control plane ready
    Admin->>K8s: Apply CNI
    Admin->>Talos: Apply worker configs
    Talos->>K8s: Join workers
    K8s-->>Admin: Cluster ready (10-15 minutes)

Implementation Details

Installation:

# Generate machine configs
talosctl gen config homelab https://192.168.1.10:6443

# Apply config to control plane (booted from ISO)
talosctl apply-config --insecure --nodes 192.168.1.10 --file controlplane.yaml

# Bootstrap Kubernetes
talosctl bootstrap --nodes 192.168.1.10 --endpoints 192.168.1.10

# Get kubeconfig
talosctl kubeconfig --nodes 192.168.1.10

# Add workers
talosctl apply-config --insecure --nodes 192.168.1.11 --file worker.yaml

Machine Configuration (declarative YAML):

version: v1alpha1
machine:
  type: controlplane
  install:
    disk: /dev/sda
  network:
    hostname: control-plane-1
    interfaces:
      - interface: eth0
        addresses:
          - 192.168.1.10/24
cluster:
  clusterName: homelab
  controlPlane:
    endpoint: https://192.168.1.10:6443
  network:
    cni:
      name: custom
      urls:
        - https://raw.githubusercontent.com/projectcalico/calico/v3.28.0/manifests/calico.yaml

Resource Requirements:

RAM: 768MB total (256MB OS + 512MB Kubernetes)
CPU: 1-2 cores
Disk: 10-15GB (500MB OS + 10GB containers)

Maintenance:

# Upgrade Talos (OS + Kubernetes)
talosctl upgrade --nodes 192.168.1.10 --image ghcr.io/siderolabs/installer:v1.9.0

# Upgrade Kubernetes version
talosctl upgrade-k8s --nodes 192.168.1.10 --to 1.32.0

# Apply config changes
talosctl apply-config --nodes 192.168.1.10 --file controlplane.yaml

Pros and Cons

Good, because Kubernetes built-in (no separate installation)
Good, because minimal attack surface (no SSH, shell, package manager)
Good, because immutable infrastructure (config drift impossible)
Good, because API-driven management (GitOps-friendly)
Good, because lowest resource overhead (~768MB RAM)
Good, because declarative configuration (YAML in version control)
Good, because secure by default (no manual hardening)
Good, because smallest disk footprint (~500MB OS)
Good, because designed specifically for Kubernetes
Good, because simple declarative upgrades (OS + K8s)
Good, because UEFI Secure Boot support
Neutral, because smaller community (but active and helpful)
Bad, because steep learning curve (paradigm shift)
Bad, because limited to Kubernetes workloads only
Bad, because troubleshooting without shell requires different approach
Bad, because relatively new (less mature than Ubuntu/Fedora)
Bad, because no escape hatch for manual intervention

Option 4: Harvester HCI (hyperconverged platform)

Use Harvester, a hyperconverged infrastructure platform built on K3s and KubeVirt for unified VM + container management.

Architecture Overview

sequenceDiagram
    participant Admin
    participant Server as Bare Metal Server
    participant Harvester as Harvester HCI
    participant K3s as K3s / KubeVirt
    participant Storage as Longhorn Storage
    
    Admin->>Server: Boot Harvester ISO
    Server->>Harvester: Installation wizard
    Admin->>Harvester: Configure cluster (VIP, storage)
    Harvester->>Server: Install RancherOS 2.0
    Harvester->>Server: Install K3s
    Server->>Server: Reboot
    Harvester->>K3s: Start K3s server
    K3s->>Storage: Deploy Longhorn
    K3s->>Server: Deploy KubeVirt
    K3s->>Server: Deploy multus CNI
    Harvester-->>Admin: Web UI ready
    Admin->>Harvester: Add nodes
    Harvester->>K3s: Join cluster
    K3s-->>Admin: Cluster ready (20-30 minutes)

Implementation Details

Installation: Interactive ISO wizard or cloud-init config

Resource Requirements:

RAM: 8GB minimum per node (16GB+ recommended)
CPU: 4+ cores per node
Disk: 250GB+ per node (100GB OS + 150GB storage)
Nodes: 3+ for production HA

Features:

Web UI management
Built-in storage (Longhorn)
VM support (KubeVirt)
Live migration
Rancher integration

Pros and Cons

Good, because unified VM + container platform
Good, because built-in K3s (Kubernetes included)
Good, because web UI simplifies management
Good, because built-in persistent storage (Longhorn)
Good, because VM live migration
Good, because Rancher integration
Neutral, because immutable OS layer
Bad, because very heavy resource requirements (8GB+ RAM)
Bad, because complex architecture (KubeVirt, Longhorn, multus)
Bad, because overkill for container-only workloads
Bad, because larger attack surface (web UI, VM layer)
Bad, because requires 3+ nodes for HA (not single-node friendly)
Bad, because steep learning curve for full feature set

More Information

Detailed Analysis

For in-depth analysis of each operating system:

Ubuntu Server Analysis
- Installation methods (kubeadm, k3s, MicroK8s)
- Cluster initialization sequences
- Maintenance requirements and upgrade procedures
- Resource overhead and security posture
Fedora Server Analysis
- kubeadm with CRI-O installation
- SELinux configuration for Kubernetes
- Rapid release cycle implications
- RHEL ecosystem compatibility
Talos Linux Analysis
- API-driven, immutable architecture
- Declarative configuration model
- Security-first design principles
- Production readiness and advanced features
Harvester HCI Analysis
- Hyperconverged infrastructure capabilities
- VM + container unified platform
- KubeVirt and Longhorn integration
- Multi-node cluster requirements

Key Findings Summary

Resource efficiency comparison:

✅ Talos: 768MB RAM, 500MB disk (most efficient)
✅ Ubuntu + k3s: 1GB RAM, 20GB disk (efficient)
⚠️ Fedora + kubeadm: 2.2GB RAM, 35GB disk (moderate)
❌ Harvester: 8GB+ RAM, 250GB+ disk (heavy)

Security posture comparison:

✅ Talos: Minimal attack surface (no SSH/shell, immutable)
✅ Fedora: SELinux by default (strong MAC)
⚠️ Ubuntu: AppArmor (moderate security)
⚠️ Harvester: Larger attack surface (web UI, VM layer)

Operational complexity comparison:

✅ Ubuntu + k3s: Single command install, familiar management
✅ Talos: Declarative, automated (after learning curve)
⚠️ Fedora + kubeadm: Manual kubeadm steps, frequent OS upgrades
❌ Harvester: Complex HCI architecture, heavy requirements

Decision Matrix

Criterion	Ubuntu + k3s	Fedora + kubeadm	Talos Linux	Harvester
Setup Simplicity	⭐⭐⭐⭐⭐	⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐
Maintenance Burden	⭐⭐⭐⭐	⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐
Security Posture	⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐
Resource Efficiency	⭐⭐⭐⭐	⭐⭐⭐	⭐⭐⭐⭐⭐	⭐
Learning Curve	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐	⭐⭐⭐
Community Support	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐	⭐⭐⭐⭐
Immutability	⭐	⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐
GitOps-Friendly	⭐⭐	⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐
Purpose-Built	⭐⭐	⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐
Overall Score	29/45	24/45	38/45	28/45

Talos Linux scores highest for Kubernetes-dedicated homelab infrastructure prioritizing security, efficiency, and GitOps workflows.

Trade-offs Analysis

Choosing Talos Linux:

Wins: Best security, lowest overhead, declarative configuration, minimal maintenance
Loses: Steeper learning curve, no shell access, smaller community

Choosing Ubuntu + k3s:

Wins: Easiest adoption, largest community, general-purpose flexibility
Loses: Higher attack surface, manual OS management, imperative config

Choosing Fedora + kubeadm:

Wins: Latest features, SELinux, enterprise compatibility
Loses: Frequent OS upgrades, complex setup, higher overhead

Choosing Harvester:

Wins: VM + container unified platform, web UI
Loses: Heavy resources, complex architecture, overkill for K8s-only

For a Kubernetes-dedicated homelab prioritizing security and efficiency, Talos Linux’s benefits outweigh the learning curve investment.

ADR-0001: Use MADR for Architecture Decision Records - MADR format used for this ADR
ADR-0002: Network Boot Architecture - Server provisioning architecture
ADR-0003: Cloud Provider Selection - Cloud infrastructure decisions

Future Considerations

Team Growth: If team grows beyond single person, reassess Ubuntu for familiarity
VM Requirements: If VM workloads emerge, consider Harvester or KubeVirt on Talos
Enterprise Patterns: If RHEL compatibility needed, reconsider Fedora/CentOS Stream
Maintenance Burden: If Talos learning curve proves too steep, fallback to k3s
Talos Maturity: Monitor Talos ecosystem growth and production adoption

Issue #598 - story(docs): create adr for server operating system

Last modified November 20, 2025: chore(deps): update actions/checkout action to v6 (#628) (721910b)

[0004] Server Operating System Selection

Context and Problem Statement

Decision Drivers

Considered Options

Decision Outcome

Consequences

Confirmation

Pros and Cons of the Options

Option 1: Ubuntu Server with k3s

Architecture Overview

Implementation Details

Pros and Cons

Option 2: Fedora Server with kubeadm

Architecture Overview

Implementation Details

Pros and Cons

Option 3: Talos Linux (purpose-built Kubernetes OS)

Architecture Overview

Implementation Details

Pros and Cons

Option 4: Harvester HCI (hyperconverged platform)

Architecture Overview

Implementation Details

Pros and Cons

More Information

Detailed Analysis

Key Findings Summary

Decision Matrix

Trade-offs Analysis

Related ADRs

Future Considerations

Related Issues