TLDR: Enable IP Aging fabric wide.
Welcome one and all to my guide on not hating yourself. Let’s talk about the problem, you’re running ACI (great!) and your team decides they want highly available Microsoft SQL clusters (also, great!). Initially, these clusters do exactly what they say on the tin, patching servers is easier your uptime is… more… up. Then little ways down the road, seemingly out of nowhere, disaster. There’s a failover, but after said failover, nothing can communicate with the AAG VIP. Pings are failing, DBAs are yelling, and everyone is blaming the network (so a Tuesday). Next thing you know the DB has been unavailable for over an hour, and no one’s sure why. You look at EP tracker in ACI and to your dismay, the IP of the VIP is mapped to the wrong node. As if the fabric never saw the failover, and it stayed broken longer than expected.
So why is this happening? Well if your environment is like mine, these DB servers are running within VMware and they’ve been vmotioning a lot due to resource constraints within your compute stack. That in and of itself is fine, but more frequently you’re seeing events that during a failover the GARP from DB announcing it now owns the virtual IP (VIP) is not making it the ACI fabric. This is problematic for several reasons, one of them very obvious and the other… not so much.
Problem #1: GARP.
Microsoft Failover Clusters (and AAG by extension) will only send a single GARP when there’s a failover. There’s not a great solution for this sadly.
Problem #2: Endpoint Learning.
Understanding why this is a problem requires a decent understanding of what an Endpoint in ACI is. It’s not just an ARP entry, an EP can have multiple IPs. Notice below the Endpoint ending in :E8, that is the Active host in an AAG cluster.
If there’s a DB failover and GARP gets dropped (maybe due to congestion or a vmotion happening at the same time), the leaf switch where the newly active DB server is attached will never see that VIP as active, and therefore never send a COOP message to the spines. Or you could have a scenario where the failover is successful, the leaf sees the newly active DB attached, and sends COOP to the spines. Which triggers a bounce message on the leaf where the VIP used to be attached and we’re good – but there was an active session between some remote client and the old DB, and the old DB sends a TCP reset message from the VIP IP after the failover occurs… which reverses everything we just talked about lol. In either scenario, ACI essentially black holes traffic to the SQL VIPs. For how long? Until the endpoint ages out. “Well, how f-ing long is that Jon???” Well, the default local endpoint aging interval is 900 seconds (15 minutes) but here’s the kicker… if any IP associated with that endpoint is responding to directed ARPs from the fabric, it never ages any of them out by default. So never, by default lol. Enter IP Aging!! This fabric-wide setting essentially takes the local endpoint aging interval and applies that per IP, solving this problem (sort of).
Takeaways? Network folks typically don’t like Microsoft Clustering solutions because they all end in sadness lol. Here in ACI land, it can be really sad if you haven’t enabled IP aging because a missed GARP or a TCP reset to a remote client after failover and blackhole traffic until manual intervention. With IP aging we can mitigate this, and at least have some control.