Installation

This page describes how to set up and install this Git repo on your local machine including getting the repo, installing Ansible, generating secrets and configuring the Ansible inventory and variables.

Requirements

At least one machine to act as a server node.
Operating on a UNIX/ Linux operating system.
Minimum 1, ideally 3, machines or VM's to host the cluster nodes.
You must have git installed on at least your local workstation and the master server node.
You have set up SSH access to all cluster nodes, ideally password-less.
All cluster machines must have the relevant ports open on their firewalls. See below.

Preferably:

A domain name to use for the cluster, with DNS and Let's Encrypt set up to issue certificates. See Techno Tim for a guide.

Info

HA ArgoCD is installed by default, which requires at least three nodes. If your cluster has less than three nodes, the HA installation will still work, however you will have several ArgoCD pods stuck in the Pending state. This won't affect Argo but will chew up node resources. If using less than three nodes, change the ArgoCD install.yaml location specification in bootstrap/base/kustomization.yaml.

See k3s requirements docs. When installing with multiple server nodes, the Ansible scripts will install K3S with the --cluster-init flag, which will set up an embedded etcd datastore. This requires further ports to be opened on the servers. Again, see the K3S docs.

Note, the following installation instructions will overwrite existing K3s installations on the hosts, and will overwrite the .kube directory on the hosts, if it exists.

Ports and firewall rules

For this cluster to work, make sure the following ports are open. For more information, see K3s docs.

Protocol	Port	Source	Destination	Description
TCP	2379 - 2380	Servers	Servers	HA K3s with embedded etcd
TCP	6443	Agents and workstation	Servers	K3s supervisor and Kubernetes API Server
UDP	8472	All nodes	All nodes	Flannel VXLAN
TCP	10250	All nodes	All nodes	Kubelet metrics
TCP & UDP	7946	All nodes and clients	Servers	MetalLB L2 mode traffic
TCP	7472	All nodes	All nodes	MetalLB metrics between nodes
		10.42.0.0/16	any	K3s pods
		10.43.0.0/16	any	K3s services

Note also that if you are using an NFS network store as backup target, all nodes will need to have access to that target, as will the K3s pods (10.42.0.0/16).

Example NFS setup:

Port	Source	Destination	Description
2049	10.42.0.0/16	NFS target host	K3s pods to NFS
111	10.43.0.0/16	NFS target host	K3s pods to NFS
2049	Servers	NFS target host	K3s server nodes running Longhorn to NFS

Definitions

Before going further, some definitions are necessary.

Site: Refers to your infrastructure - the totality of the local (and possibly remote) machines and connecting network that will constitute your cluster and client machines connecting to your cluster. This term is relevant only within the scope of this repo.
Local machine: Your local machine can take on any of the above roles. In fact, with K3s, it can simultaneously take all three if you wish. Also called your local host.
Ansible controller workstation: The machine from which you will be executing any Ansible playbooks or Makefile commands from this repo. It's most likely your local machine and probably not a cluster node. It is not necessary that it isn't a cluster node, but you might find interesting (read: painful) edge cases if your local machine is also a cluster node.
K3s/ Kubernetes server nodes: One or more machines running K3s as server nodes. The server nodes run together to form the cluster control plane. Given how voting amongst server nodes works, it makes sense for the control plane to consist of an odd number of server nodes only. A minimum of one server node is necessary to have a K3s cluster, in which case that node will operate as the server as well as execute workloads. A Highly Available (HA) control plane will exist only once there are three or more (5, 7, ...) server nodes.
Master server node: The first machine in the server list of the Ansible inventory. This term is relevant only within the scope of this repo. While all K3s server nodes are created equal, the Ansible playbooks in this repo will provision and bootstrap the cluster infrastructure workload to the master server node first, before adding other server or agent nodes to the cluster. This is because the virtual IP provided by KubeVIP needs to be available before any other nodes can be added, and this will only happen once KubeVIP is deployed to the cluster.
K3s agent nodes: One or more machines running K3s as an agent. These are worker nodes that execute workloads. A server node can also be a worker node. Any number of agent nodes can be added to the cluster, to increase the available resources and redundancy.

All machines are assumed to be on the same private network.

Install kubectl, ArgoCD CLI and Helm

Warning

You must use a kubectl version that is within one minor version difference of your cluster. For example, a v1.29 client can communicate with v1.28, v1.29, and v1.30 control planes. Using the latest compatible version of kubectl helps avoid unforeseen issues.

Install kubectl following the docs. kustomize comes as part of kubectl, albeit an earlier version sometimes.

Install helm following the docs.

Install ArgoCD CLI following the docs.

Initialise GitOps repo

This is a public repo. Ultimately, you will want to be using this as a private repo, but perhaps you will want to keep it connected to this public instance to be able to pull upstream changes. To achieve this, mirror this public repo into your own private repo.

First, get a copy of this repo. The following assumes you want to take this template as a starting point to create a new repo (eg. your private GitOps repo), then make your own modifications, while still retaining the ability to pull updates from the template. Reference SO.

Create the private repo in gitlab. In whichever directory you store your projects:

# clone the template to your private repo
git clone git@github.com:smp4/k3s-homelab-gitops.git k3s-homelab-gitops-private
cd k3s-homelab-gitops-private

# set the remote origin to your private repo on GitHub
git remote set-url origin git@github.com:YOURUSERNAME/k3s-homelab-gitops-private.git

# add the public template repo on GitHub as an upstream source
git remote add upstream git@github.com:smp4/k3s-homelab-gitops.git
git remote -v show  # verify

# push new repo to GitHub
git push -u origin main

# do work, commit, push to private repo
git push origin main

# pull updates from the public template
git fetch upstream
git merge upstream/main

Note

This is not a fork. Create a fork if you want to contribute back to the public template.

Pulling updates from public repo

In your private repo, pull the updates from the public repo, then pull the latest refs from your private repo's remote, then push to your private repo's remote.

git pull --no-rebase upstream main
git pull origin main
git push origin main

Install Ansible

Ansible is used for setup operations on the cluster node machines - OS level tasks.

For the most part, the repo does not use Ansible to do any templating of the Kubernetes resource yaml files. The idea is that Ansible is used only at bootstrap and then forgotten. An exception is some of the cluster infrastructure resources that are applied at bootstrap, including Kube VIP, where things like apiserver_endpoint are templated into the Kube VIP manifest.

Install Ansible in a Python virtual environment on the Ansible controller workstation (Ansible is a Python application).

Optionally use direnv to automatically load environment variables and activate the Python virtual environment whenever you cd into the repo directory.

Install direnv:

curl -sfL https://direnv.net/install.sh | bash

Add the following line to ~/.zshrc (see the direnv docs for other shells):

eval "$(direnv hook zsh)"

Create .envrc file in the root directory of this repo and set up Python virtual environment, specifying .venv as the directory to store the virtual environment (so that it is easily recognised by IDE's), and explicitly selecting a Python version:

echo "export VIRTUAL_ENV=.venv" >> .envrc
echo "layout python /usr/local/bin/python3.11" >> .envrc
direnv allow
which python3
which pip
pip install --upgrade pip

Otherwise, create the virtual environment using your preferred method.

With the virtual environment created, install requirements:

pip install -r requirements.txt

If needed in development, use a .env file to define environment variables within the repo.

Install the required Ansible collections:

ansible-galaxy install -r ./ansible/collections/requirements.yml

Create Ansible inventory and values files

Next, configure your cluster site. If you are going to follow the tutorials and first deploy to your local host, you will need to populate the ansible/inventory/localhost.yml file. If you are only following the other tutorials, or deploying directly to a cluster of nodes, then you can ignore this file. All users will eventually need to populate the ansible/inventory/hosts.yml file.

Start with the defaults in the samples/ directory:

cp ./samples/localhost-sample.yml ./ansible/inventory/localhost.yml
cp ./samples/hosts-sample.yml ./ansible/inventory/hosts.yml
cp ./samples/all-sample.yml ./ansible/inventory/group_vars/all.yml

Edit localhost.yml and all.yml with the relevant values for your local machine. Descriptions for each variable are given in the sample files. The localhost.yml, make sure you list connection details for the local machine in the servers group only, and no machines in the agent group.

Update the host connection details in hosts.yml and cluster configuration variables in all.yml to suit your needs. You may need to modify the all.yml file when progressing to cluster deployment.

At this point, it might be useful to understand what each of the directories in the repo contents are doing.

Generate templated manifests

Ansible is mostly used for bootstrapping the cluster nodes, however in a small number of cases it is mandatory to generate some kubernetes manifest (yaml) files from templates. This is just a hack to use Ansible to template and generate the KubeVIP and MetalLB manifests into the working directory of the GitOps repo on your local machine and only needs to be performed once prior to the first cluster deployment.

Warning

The following steps generating the KubeVIP templates and committing them to the repo must be completed before any deployments are made to the cluster, as KubeVIP creates the API endpoint IP address which both your local machine and all cluster nodes require to connect to and join the cluster.

Create the KubeVIP and MetalLB manifests from the Ansible templates with the Ansible playbook generate-templates. From the repo root directory:

make manifest-templates

It does not matter which Ansible inventory file is used to execute the generate-templates play, as it will execute it on the local machine (the Ansible Controller) only, and once only. The contents of the values file all.yml are the main input.

Warning

The make manifest-templates command must be run from the root directory of this repo (the directory in which you found the README.md that you're currently reading). This directory is used to template and copy the kubeVIP manifests into the respective infrastructure workload directories in ./infrastructure.

Commit the new files and push to the repository so that ArgoCD can reconcile them into the cluster later.

git status
git add .
git commit -m "Add KubeVIP manifests."
git push  # Assuming the repo is already set up to push to remote origin.

Create local bootstrap secrets

These secrets are used by Ansible to bootstrap the node machines. Encrypted secrets are implemented as variables (rather than files).

Bootstrap secrets (host ansible_become_pass, k3s_token ) are stored in Ansible Vault. Ansible Vault comes as part of the Ansible installation. The user must create these secrets locally, and store them locally. They are never used again once the cluster is initialised. The encrypted secrets are provided to Ansible via ansible/inventory/group_vars/all.yml and ansible/inventory/hosts.yml (or localhost.yml). These files, with their secrets, are listed in .gitignore, so are never committed to version control.

Production-time secrets will be separately encrypted and stored using Sealed Secrets.

The user must create the bootstrap encrypted secrets first, before running any Ansible playbooks. The scripts are currently configured assuming all secrets belong to a vault-id called home, encrypted with a single password.

Create the secrets at the prompts triggered by each of the following commands. Don't hit enter after entering the password: use ctrl-d per the instructions that Ansible will print to screen. Use the same vault password for each command (you can use different passwords if you want, but then the make commands in future steps won't work out of hte box).

To encrypt a password to elevate ansible_user privileges on a host, run and paste the output of the following for the respective host in hosts.yml:

ansible-vault encrypt_string --vault-id home@prompt --stdin-name "ansible_become_pass"

To encrypt the vault_k3s_token variable in all.yml:

ansible-vault encrypt_string --vault-id home@prompt --stdin-name "k3s_token"

The above secrets need to be generated for each node.

For convenience, save your vault password in plain text in a file called vault_pass in the root directory of this repo. Make sure this filename is in your .gitignore so that it doesn't get tracked by Git.

# example vault_pass file
your-password-here-in-plain-text

Some of the Ansible-related make commands used throughout the tutorials will call this password automatically to decrypt your secrets. If you don't want to store your password like this, you will need to manually edit the Makefile yourself. Other commands will prompt the user for the password of the home vault. Use these as an example for your changes.

Now kubectl can be used without having to use sudo or manually specify the cluster config location.

From here, if you know what you are doing you can provision and deploy the cluster directly to production. To take an incremental approach, checking that everything works at each step, deploy in stages following the Tutorials.

Customise the cluster configuration

Manually edit the following *.yaml files to suit your preferences.

File	Key	Value	Comment
`bootstrap/base/ingress-argo.yaml`	`spec.routes.match:`	`Host('argocd.your.domain.com')`	URL for ArgoCD UI
`components/envs/prod/patch-appproj-dev1-sourceRepos.yaml`	`spec.sourceRepos:`	Private Gitops Repo URL (ssh)
`components/envs/prod/patch-appset-infrastructure-generators.yaml`	`spec.generators.git.repoURL:`	Private Gitops Repo URL (ssh)
`components/envs/prod/patch-appset-infrastructure-source.yaml`	`spec.template.spec.source.repoURL:`	Private Gitops Repo URL (ssh)
`components/envs/prod/patch-appset-tenants-generators.yaml`	`spec.generators.git.repoURL:`	Private Gitops Repo URL (ssh)
`components/envs/prod/patch-appset-tenants-source.yaml`	`spec.template.spec.source.repoURL`	Private Gitops Repo URL (ssh)
`infrastructure/cert-manager/base/issuer-letsencrypt-prod.yaml`	`spec.acme.email`, `spec.acme.solvers.dns01.cloudflare.email`	Your DNS service email address (eg Cloudflare)
`infrastructure/cert-manager/base/issuer-letsencrypt-prod.yaml`	`spec.solvers.selector.dnsZones`	Your DNS zone URL
`infrastructure/cert-manager/base/issuer-letsencrypt-stage.yaml`	per prod	per prod
`infrastructure/longhorn/base/ingressRoute-dashboard.yaml`	`spec.routes.match:`	`Host('longhorn.your.domain.com')`	URL for longhorn UI
`infrastructure/longhorn/base/setting-buTarget.yaml`	`value:`	`nfs://192.168.0.1:/path/to/your/nfs/backup/target`	Same directory as set in Ansible `all.yml` values file for `longhorn_nfs_backup_target`
`infrastructure/traefik/base/ingress-dashboard.yaml`	`spec.routes.match:`	Host('traefik.your.domain.com')`	URL for Traefik UI
`infrastructure/traefik/base/values-traefik.yaml`	`loadBalncerIP:`	Any IP in the MetalLB range
`infrastructure/traefik/envs/dev/cert-selfsigned.yaml`	`spec.commonName:`, `spec.dnsNames:`	`traefik.your.domain.com`	URL for Traefik UI
`infrastructure/traefik/envs/prod/cert-wildcard-prod.yaml`	`spec.dnsNames:`, `spec.commonName`	`your.domain.come`	Your root domain URL
`infrastructure/traefik/envs/stage/cert-wildcard-prod.yaml`	`spec.dnsNames:`, `spec.commonName`	`your.domain.come`	Your root domain URL
`tenants/test-ingress/base/whoami.yaml`	In IngressRoute `spec.routes.match:`	`Host('test.your.domain.com)`	URL for ingress test tenant workload
`tenants/test-lb/base/service.yaml`	`metadata.annotations.metallb.universe.tf/loadBalancerIPs:`	IP address	IP Address from your MetalLB pool

For instructions on which values to use, see the README.md files in each of the workload directories.

Helm Charts

Several of the infrastructure workloads are installed from Helm Charts. Go through each of the infrastructure workload directories, read their README.md files and follow their installation instructions.

Last update: June 13, 2024 16:06:52
Created: June 13, 2024 16:06:52

Authors: smp4