Skip to content

Instantly share code, notes, and snippets.

@willjasen
Last active August 26, 2024 10:22
Show Gist options
  • Save willjasen/df71ca4ec635211d83cdc18fe7f658ca to your computer and use it in GitHub Desktop.
Save willjasen/df71ca4ec635211d83cdc18fe7f658ca to your computer and use it in GitHub Desktop.
Create a Proxmox cluster that communicates over Tailscale

‼️ DANGER ‼️

In the interest of complete transparency, if you follow this guide, there’s a very minuscule but non-zero chance that you may violate the Bekenstein bound, at which the resulting black hole may swallow the earth whole. You have been warned!


⚠️ WARNING ⚠️

  • This guide is for development, testing, and research purposes only. This guide comes with no guarantee or warranty that these steps will work within your environment. Should you attempt within a production environment, any negative outcomes are not the fault of this guide or its author.
  • This guide was tested on Proxmox 8 / Debian 12.

📝 Prologue 📝

  • This example uses "host1" and "host2" as example names for the hosts
  • This example uses "example-test.ts.net" as a Tailscale MagicDNS domain
  • The Tailscale IP for host1 is 100.64.1.1
  • The Tailscale IP for host2 is 100.64.2.2

📋 Steps 📋

  1. Setup two Proxmox hosts

  2. Install Tailscale on the hosts: curl -fsSL https://tailscale.com/install.sh | sh;

  3. Update /etc/hosts on all hosts with the proper host entries:

    • 100.64.1.1 host1.example-test.ts.net host1
    • 100.64.2.2 host2.example-test.ts.net host2
  4. Since DNS queries will be served via Tailscale, ensure that your global DNS server via Tailscale can resolve host1 as 100.64.1.1 and host2 as 100.64.2.2

  5. If you need to allow for the traffic within your Tailscale ACL, allow TCP 22, TCP 8006, and UDP 5405 - 5412; example as follows:

    {"action": "accept", "proto": "tcp", "src": ["host1", "host2"], "dst": ["host1:22"]},   // SSH
    {"action": "accept", "proto": "tcp", "src": ["host1", "host2"], "dst": ["host2:22"]},   // SSH
    {"action": "accept", "proto": "tcp", "src": ["host1", "host2"], "dst": ["host1:8006"]}, // Proxmox web
    {"action": "accept", "proto": "tcp", "src": ["host1", "host2"], "dst": ["host2:8006"]}, // Proxmox web
    {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host1:5405"]}, // corosync
    {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host1:5406"]}, // corosync
    {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host1:5407"]}, // corosync
    {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host1:5408"]}, // corosync
    {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host1:5409"]}, // corosync
    {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host1:5410"]}, // corosync
    {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host1:5411"]}, // corosync
    {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host1:5412"]}, // corosync
    {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host2:5405"]}, // corosync
    {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host2:5406"]}, // corosync
    {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host2:5407"]}, // corosync
    {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host2:5408"]}, // corosync
    {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host2:5409"]}, // corosync
    {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host2:5410"]}, // corosync
    {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host2:5411"]}, // corosync
    {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host2:5412"]}, // corosync
    
  6. Create the cluster using host1 (so that host2 has a cluster to join to)

  7. In order for clustering to initially succeed, all cluster members must only have a link0 within corosync associated with Tailscale (if any other links exists within corosync, they must be temporarily removed for this initial cluster member addition to succeed); to have host2 join the cluster of host1, then run from host2: pvecm add host1 --link0 100.64.2.2

  8. You should SSH in from host1 to host2 and vice versa; if this isn't done, then tasks like migrations and replications may not work until performed:

    • ssh host1
    • ssh host2
  9. That should do it! Test, test, test!

To add a third member to the cluster (and so on), repeat these similar steps.


🔧 Troubleshooting 🔧

Should clustering not be successful, you'll need to do two things:

  1. Remove the err'd member from host1 by running: pvecm delnode host2
  2. Reset clustering on host2 by running: systemctl stop pve-cluster corosync; pmxcfs -l; rm -rf /etc/corosync/*; rm /etc/pve/corosync.conf; killall pmxcfs; systemctl start pve-cluster; pvecm updatecerts;

Then try again.

@protonaut
Copy link

protonaut commented Jun 22, 2024

Hi, when I follow your instruction I got stuck within joining host 2 into the cluster because I had to create the cluster with an link0 containing a non tailscale IP.
Seems to be the case, that there is an additional linux bridge needed or how did you create the cluster with a link0 containing an IP address from tailscale?

@willjasen
Copy link
Author

willjasen commented Jul 9, 2024

Seems to be the case, that there is an additional linux bridge needed or how did you create the cluster with a link0 containing an IP address from tailscale?

You are correct here. The way that this would need to be worked around (or at least how I did it) is to setup an initial two members into a cluster using a local address with ring0, then alter /etc/pve/corosync.conf to have ring0 be the Tailscale addresses while ring1 can then be the local addresses (or an APIPA address when adding a cluster member that isn't local):

...
nodelist {
  node {
    name: node1
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 100.64.0.1
    ring1_addr: 192.168.1.101
  }
  node {
    name: node2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 100.64.0.2
    ring1_addr: 192.168.1.102
  }
  node {
    name: node3
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 100.64.0.3
    ring1_addr: 169.254.169.1
  }
}
...

In the above case, node1 and node2 are local to one another while node3 is remote. With node1 and node2 already in place, when node3 joins, it can join using its ring0 as its Tailscale address. There may be the possibility that node3 will complain that ring1 isn't available, and in that particular case, I would temporarily remove all ring1 addresses, join node3 to the cluster with only its Tailscale address, then add the ring1 addresses back in, supplementing the newly remote node3 with an APIPA address for its ring1 address.

Got a bit I'm juggling at the moment but I'll work in these notes to the gist at some point. Let me know if you have any further questions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment