Skip to content

Instantly share code, notes, and snippets.

@kronenpj
Last active September 23, 2024 11:59
Show Gist options
  • Save kronenpj/e90258f12f7a40c4f38a23b609b3288b to your computer and use it in GitHub Desktop.
Save kronenpj/e90258f12f7a40c4f38a23b609b3288b to your computer and use it in GitHub Desktop.
OpnSense 24.7 - Disable WAN + OPT2 Interfaces during CARP Failover
#!/usr/local/bin/php
<?php
require_once("config.inc");
require_once("system.inc");
require_once("interfaces.inc");
require_once("interfaces.lib.inc");
require_once("util.inc");
$subsystem = !empty($argv[1]) ? $argv[1] : '';
$type = !empty($argv[2]) ? $argv[2] : '';
// Add more interfaces that need to be disabled/enabled after a CARP event.
//$iface_aliases = array('wan', 'opt2');
//$iface_names = array('wan' => 'igc0', 'opt2' => 'gif0');
$iface_aliases = array('wan', 'wan');
$iface_names = array('wan' => 'igc0');
$dhcp_ifaces = array('lan', 'opt3', 'opt1');
// Optional if you want the default route removed on the backup system
$lan_vip = 'YOUR_LAN_GATEWAY_Virtual_IP';
$remove_backup_route = False;
if ($type != 'MASTER' && $type != 'BACKUP' && $type != 'INIT') {
log_error("Carp '$type' event unknown from source '{$subsystem}'");
exit(1);
}
if (!strstr($subsystem, '@')) {
log_error("Carp '$type' event triggered from wrong source '{$subsystem}'");
exit(1);
}
if ($type === "MASTER") {
if ($config['interfaces']['wan']['enable'] == 0) {
foreach ($iface_aliases as $ifkey) {
// $iface_name = $iface_names[$ifkey];
log_error("enable interface '$ifkey' due CARP event '$type'");
$config['interfaces'][$ifkey]['enable'] = '1';
legacy_interface_flags($ifkey, 'up');
interface_configure(false, $ifkey, true, true);
write_config("enable interface '$ifkey' due CARP event '$type'", false);
//usleep(200 * 1000);
//foreach ($dhcp_ifaces as $dhkey) {
// $config['dhcpd'][$dhkey]['enable'] = true;
//}
}
} else {
log_msg("Carp '$type' duplicate event triggered.");
}
} else if ($type === "BACKUP") {
if ($config['interfaces']['wan']['enable'] == 1) {
foreach ($iface_aliases as $ifkey) {
// $iface_name = $iface_names[$ifkey];
log_error("disable interface '$ifkey' due CARP event '$type'");
//foreach ($dhcp_ifaces as $dhkey) {
// $config['dhcpd'][$dhkey]['enable'] = false;
//}
interface_reset($ifkey);
unset($config['interfaces'][$ifkey]['enable']);
interface_configure(false, $ifkey, true, false);
exec('/sbin/ifconfig ' . escapeshellarg($ifkey) . 'down 2>&1', $ifc, $ret);
write_config("disable interface '$ifkey' due CARP event '$type'", false);
if ($remove_backup_route === True) {
exec('/sbin/route del default >&1', $ifc, $ret);
exec('/sbin/route add default ' . $lan_vip . ' >&1', $ifc, $ret);
}
}
} else {
log_msg("Carp '$type' duplicate event triggered.");
}
}
?>
@kronenpj
Copy link
Author

You should be able to add 'pppoe0' to the list on line 23, possibly replacing 'opt2'.

@Blip9575
Copy link

@kronenpj Thank you for the feedback.

image

After replacing 'opt2' with 'pppoe0' the PPPoE connection remained connected and administratively down.

Are you able to advise what commands would be required to 'connect' and 'disconnect' a PPPoE connection rather than disable the WAN interface? Thanks in advance.

@kronenpj
Copy link
Author

Unfortunately no. I'm not entirely sure I have it working on my firewalls either. The available methods and existing actions aren't documented so I'm really just trying different things and seeing if something works. So far I haven't found any combination that satisfactorily solves this situation.

@Blip9575
Copy link

@kronenpj After removing 'true' from line 36 the script now disconnects the PPPoE connection prior to disabling the WAN interface on 23.7.1_3

interface_bring_down($ifkey);

Thanks again for all your feedback.

@kronenpj
Copy link
Author

Very interesting. I'm glad you got it to work! I need to get back to looking at mine.

@kronenpj
Copy link
Author

@kronenpj
Copy link
Author

I've updated the script with @Blip9575's suggested change. It's working as I need it to on recent versions of Opnsense.

@willjasen
Copy link

Heya, thanks for this script! It helped me get started on managing my multiple WANs via CARP.

I did run into an issue though, and that is that I have multiple CARP subsystems (one per LAN) and sometimes CARP on one LAN would transition from MASTER to BACKUP or vice versa which would initiate toggling the WAN interfaces. I've spent about the last four hours sorting that out in my own version such that toggling the WAN interfaces only happens once all CARP subsystems are MASTER or BACKUP (or if CARP is disabled/enabled). I also throw some more logging in it so that it makes a little more sense what's happening when it does.

Hope this helps someone!
https://gist.github.com/willjasen/6ae0f47bca36ced2bd52b2fefc2bc21e

@skl283
Copy link

skl283 commented Apr 1, 2024

Hi Guys, i've posted a question at https://gist.github.com/spali/2da4f23e488219504b2ada12ac59a7dc?permalink_comment_id=5008023#gistcomment-5008023 - i've tried your Script @kronenpj and also the other variant from @willjasen and the one you mentioned here

@kronenpj
Copy link
Author

kronenpj commented Jul 4, 2024

I've made another update to the script in an attempt to reduce the switch-over / recovery time. On my setup the unbound daemon took upwards of 30 minutes to begin serving DNS. The current version of this script reduces that to under 5 minutes. The "cause" was the daemon being restarted each time a WAN interface was changed, which was approximately 14 times in my case.

This change reduces that to two, partially by removing opt2 from the list of interfaces to bring down and partially to only run the configuration change twice.

Unfortunately this has identified two problems, which may be bugs:

  1. I need to bring the WAN interface up twice either because it doesn't properly request a DHCP address or it takes "too long" to get one from my ISP.
  2. ~~Failing over from primary->secondary takes longer than secondary->primary. I don't know the cause of the asymmetry, but I'm going to bring it up in the forums and possibly file a bug. Presumably the fail-back does the same work but it takes ~1 second to transition back to the primary firewall.~~ Update - This was possibly due to a 'bad' secondary VM image. I've cloned the primary to the secondary and it appears that the failover is symmetric now.

@kronenpj
Copy link
Author

Just upgraded to 24.7 and verified this script still works well 🎉

@woodshoes
Copy link

Well - I literally got to this a few hours too late LOL.
In 24.7.2 they have retired the function "interfaces_bring_up" - which quite breaks this script (https://forum.opnsense.org/index.php?topic=42355.msg209137#msg209137)
image
Any ideas? PHP Is not my strong suit

@kronenpj
Copy link
Author

kronenpj commented Aug 22, 2024

Looks like it's been replaced by the ever-inspiring replacement of:
legacy_interface_flags($iface, 'up');

The good news is that the function already exists if interfaces.lib.inc is included.

@woodshoes
Copy link

Yep that worked for me! Thank you very much

@edward-scroop
Copy link

The script should also ignore the third state of INIT as i keep seeing it cause a failover , despite it being harmless.

line 11 can be changed to
if ($type != 'MASTER' && $type != 'BACKUP' && $type != 'INIT') {

and line 47 can be changed to
} else if ($type === "BACKUP") {
or ignored.

@kronenpj
Copy link
Author

kronenpj commented Sep 11, 2024

@edward-scroop:

The script should also ignore the third state of INIT as i keep seeing it cause a failover , despite it being harmless.
line 11 can be changed to if ($type != 'MASTER' && $type != 'BACKUP' && $type != 'INIT') {
and line 47 can be changed to } else if ($type === "BACKUP") { or ignored.

Both these changes have been made and the script works as intended.

@skl283
Copy link

skl283 commented Sep 15, 2024

@kronenpj could you please look at this comment? to be "safe" - could you make these changes to make the relevant routes?

@kronenpj
Copy link
Author

@skl283 I could, as an optional action to take. However, my setup works properly for me without it. I have system gateways configured:

Name       | Interface | Protocol | Priority
WAN_DHCP   | Internet  | IPv4     | 200 (upstream)
primaryfw  | LAN       | IPv4     | 255 (upstream)

When the backup is made primary, the WAN_DHCP gateway takes over and vice-versa.

@kronenpj
Copy link
Author

kronenpj commented Sep 15, 2024

Updated to add the requested, optional route removal on backup system.
NOTE: This change is untested by me as it is not needed for my setup to work properly.

@toddgonzo74
Copy link

I'm trying to get this script working and am failing.

I changed the top section to match my environment.

$iface_aliases = array('wan', 'opt1');
$iface_names = array('wan' => 'igb4', 'opt1' => 'igb5');
$dhcp_ifaces = array('lan', 'opt2', 'opt3', 'opt4', 'opt6');

When I failover (Using forced maintenance mode), my master node immediately goes to backup and the backup picks up the master roles. My WAN interfaces, however, don't get modified.

How can I troubleshoot what is not working properly?

@kronenpj
Copy link
Author

kronenpj commented Sep 15, 2024

@toddgonzo74 I usually start by looking at /var/log/system/latest.log. There should be entries like this:
<11>1 2024-09-15T12:42:19+00:00 opnsense-backup opnsense 13138 - [meta sequenceId="13"] /usr/local/etc/rc.syshook.d/carp/10-wancarp: disable interface 'wan' due CARP event 'BACKUP'
and
<11>1 2024-09-15T12:42:50+00:00 opnsense opnsense 2762 - [meta sequenceId="182"] /usr/local/etc/rc.syshook.d/carp/10-wancarp: enable interface 'wan' due CARP event 'MASTER'
on the backup and primary firewalls respectively.

Look at the surrounding entries for clues. I see The command '/sbin/ifconfig 'wan' up' failed to execute but my WAN connection still came up.

You might also want to try rebooting or powering the primary firewall off instead of using forced maintenance mode on CARP. It should work the same, but it's "only" emulating what might happen - if that makes sense.
Also take a copy of the current script, it's changed a little. E.g. the opt2 interface isn't in the alias list any longer - though I'm not sure why wan is in there twice...

@toddgonzo74
Copy link

I re-imported the script from above and it works. Been fighting with CARP for a while. Changing over to a Proxmox-based setup and I finally have everything working. This was the last piece.

@kronenpj
Copy link
Author

Excellent. I'm glad it's working for you.
I tried removing the "extra" wan entry and started having problems with failover.

@raegedoc
Copy link

Updated to add the requested, optional route removal on backup system. NOTE: This change is untested by me as it is not needed for my setup to work properly.

Happy to see that this help some people get around their situation with CARP like I had before.

@toddgonzo74
Copy link

I'm seeing an issue when failing over to my backup and back to the master again.

The WAN interfaces on the Master go down and the Backup comes up when I do a failover. What's not happening is a triggering of an interface reset. The interfaces get an IP address, but gateway monitoring doesn't trigger and the interfaces operate like they are down. When I refresh the interfaces, they renew IP and start passing traffic again.

Any ideas on this? I'm thinking maybe a follow up action after the interface bring up that refreshes the interface to force a DHCP renew.

@edward-scroop
Copy link

This is the same as using the gui to restart routing.
exec("/usr/local/sbin/pluginctl -s routing restart 2>&1");

@toddgonzo74
Copy link

toddgonzo74 commented Sep 21, 2024 via email

@edward-scroop
Copy link

@toddgonzo74 do you have any logs of the errors?
perhaps adding a sleep right before restarting the routing might help?

sleep(2);
exec("/usr/local/sbin/pluginctl -s routing restart 2>&1");

@toddgonzo74
Copy link

toddgonzo74 commented Sep 21, 2024 via email

@toddgonzo74
Copy link

I fiddled with it a bit this morning and got it working. Failover seamless both ways now.. I might lose a ping when failing over.

Just added this after "write_config("enable interface '$ifkey' due CARP event '$type'", false);"

      sleep(3);
      exec("/usr/local/sbin/pluginctl -s routing restart 2>&1");

Thanks @kronenpj and @edward-scroop and all others.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment