Create a Proxmox cluster that communicates over Tailscale

‼️ DANGER ‼️

In the interest of complete transparency, if you follow this guide, there’s a very minuscule but non-zero chance that you may violate the Bekenstein bound, at which the resulting black hole may swallow the earth whole. You have been warned!

⚠️ WARNING ⚠️

This guide is for development, testing, and research purposes only. This guide comes with no guarantee or warranty that these steps will work within your environment. Should you attempt within a production environment, any negative outcomes are not the fault of this guide or its author.
This guide was tested on Proxmox 8 / Debian 12.

📝 Prologue 📝

This example uses "host1" and "host2" as example names for the hosts
This example uses "example-test.ts.net" as a Tailscale MagicDNS domain
The Tailscale IP for host1 is 100.64.1.1
The Tailscale IP for host2 is 100.64.2.2

📋 Steps 📋

Setup two Proxmox hosts
Install Tailscale on the hosts: curl -fsSL https://tailscale.com/install.sh | sh;
Update /etc/hosts on all hosts with the proper host entries:
- 100.64.1.1 host1.example-test.ts.net host1
- 100.64.2.2 host2.example-test.ts.net host2
Since DNS queries will be served via Tailscale, ensure that your global DNS server via Tailscale can resolve host1 as 100.64.1.1 and host2 as 100.64.2.2

If you need to allow for the traffic within your Tailscale ACL, allow TCP 22, TCP 8006, and UDP 5405 - 5412; example as follows:

{"action": "accept", "proto": "tcp", "src": ["host1", "host2"], "dst": ["host1:22"]},   // SSH
{"action": "accept", "proto": "tcp", "src": ["host1", "host2"], "dst": ["host2:22"]},   // SSH
{"action": "accept", "proto": "tcp", "src": ["host1", "host2"], "dst": ["host1:8006"]}, // Proxmox web
{"action": "accept", "proto": "tcp", "src": ["host1", "host2"], "dst": ["host2:8006"]}, // Proxmox web
{"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host1:5405"]}, // corosync
{"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host1:5406"]}, // corosync
{"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host1:5407"]}, // corosync
{"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host1:5408"]}, // corosync
{"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host1:5409"]}, // corosync
{"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host1:5410"]}, // corosync
{"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host1:5411"]}, // corosync
{"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host1:5412"]}, // corosync
{"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host2:5405"]}, // corosync
{"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host2:5406"]}, // corosync
{"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host2:5407"]}, // corosync
{"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host2:5408"]}, // corosync
{"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host2:5409"]}, // corosync
{"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host2:5410"]}, // corosync
{"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host2:5411"]}, // corosync
{"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host2:5412"]}, // corosync

Create the cluster using host1 (so that host2 has a cluster to join to)
In order for clustering to initially succeed, all cluster members must only have a link0 within corosync associated with Tailscale (if any other links exists within corosync, they must be temporarily removed for this initial cluster member addition to succeed); to have host2 join the cluster of host1, then run from host2: pvecm add host1 --link0 100.64.2.2
You should SSH in from host1 to host2 and vice versa; if this isn't done, then tasks like migrations and replications may not work until performed:
- ssh host1
- ssh host2
That should do it! Test, test, test!

To add a third member to the cluster (and so on), repeat these similar steps.

🔧 Troubleshooting 🔧

Should clustering not be successful, you'll need to do two things:

Remove the err'd member from host1 by running: pvecm delnode host2
Reset clustering on host2 by running: systemctl stop pve-cluster corosync; pmxcfs -l; rm -rf /etc/corosync/*; rm /etc/pve/corosync.conf; killall pmxcfs; systemctl start pve-cluster; pvecm updatecerts;

Then try again.

Seems to be the case, that there is an additional linux bridge needed or how did you create the cluster with a link0 containing an IP address from tailscale?

You are correct here. The way that this would need to be worked around (or at least how I did it) is to setup an initial two members into a cluster using a local address with ring0, then alter /etc/pve/corosync.conf to have ring0 be the Tailscale addresses while ring1 can then be the local addresses (or an APIPA address when adding a cluster member that isn't local):

...
nodelist {
  node {
    name: node1
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 100.64.0.1
    ring1_addr: 192.168.1.101
  }
  node {
    name: node2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 100.64.0.2
    ring1_addr: 192.168.1.102
  }
  node {
    name: node3
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 100.64.0.3
    ring1_addr: 169.254.169.1
  }
}
...

In the above case, node1 and node2 are local to one another while node3 is remote. With node1 and node2 already in place, when node3 joins, it can join using its ring0 as its Tailscale address. There may be the possibility that node3 will complain that ring1 isn't available, and in that particular case, I would temporarily remove all ring1 addresses, join node3 to the cluster with only its Tailscale address, then add the ring1 addresses back in, supplementing the newly remote node3 with an APIPA address for its ring1 address.

Got a bit I'm juggling at the moment but I'll work in these notes to the gist at some point. Let me know if you have any further questions!

willjasen/proxmox-cluster-over-tailscale.md

‼️ DANGER ‼️

⚠️ WARNING ⚠️

📝 Prologue 📝

📋 Steps 📋

🔧 Troubleshooting 🔧

protonaut commented Jun 22, 2024 •

edited

Loading

willjasen commented Jul 9, 2024 •

edited

Loading

willjasen/proxmox-cluster-over-tailscale.md

‼️ DANGER ‼️

⚠️ WARNING ⚠️

📝 Prologue 📝

📋 Steps 📋

🔧 Troubleshooting 🔧

protonaut commented Jun 22, 2024 • edited Loading

willjasen commented Jul 9, 2024 • edited Loading

protonaut commented Jun 22, 2024 •

edited

Loading

willjasen commented Jul 9, 2024 •

edited

Loading