Cisco ACI Multi-Pod (Pt.3) – Fabric Discovery & Verification
In part one of this series we configured the IPN (Inter-Pod Network) for connecting the ACI fabrics in different pods. In part two of this series we configured the APIC in pod 1 for multi-pod with L3Outs and spine interface configuration. In this, the third part of the series we will validate the configuration and ensure that multi-pod is working.
Software/Firmware Versions
There a few things to keep in mind when building the additional pod. There are minimum versions for the spine switches to use multi-pod, if the switches are not at the minimum level of code supporting multi-pod it wont work. Check CCO documents to make sure that the level of code is correct on the switches when deploying the additional pod. If these are lower than the minimum required you will have to go through the upgrade process first. The APIC (if you are deploying a new APIC to the new pod) has strict upgrade paths you must follow. For example on a additional pod build, I had the switches and APIC delivered with code that did not support multi-pod and was a few versions (upgrade steps) away from pod 1. This is the basic process I followed to bring the new pod up to the correct firmware versions and only after this the new pod could be connected to the IPN .
- Build Pod 2 as a new fabric with the one new APIC to be deployed in Pod 2 (standalone fabric)
- Upgrade the controller and switches using the upgrade paths on CCO
- Perform a factory reset on the fabric in POD 2. Restore to factory configurations – firmware will remain.
- Power up spine switches, then leaf switches – leave new APIC powered down.
- Once fabric discovery complete by primary APIC, power up new APIC and configure in current fabric as 3rd (or nth) APIC.
Fabric Discovery
Fabric discovery needs to take place for additional pods as it does for the first pod. The spine node(s) in the new pod will send a DHCP request to the connected IPN devices which will forward to the DHCP relay address configured which should be the APIC IP addresses, these are the infra:overlay-1 IP addresses not the OOB ones. The APIC will send back an address from the pod 2 TEP pool and additional DHCP options for the spine to get its configuration uploaded from the APIC. This means basic IP connectivity needs to be in place, we don’t need multicast at this stage. Once the spines are discovered by the APIC, you will need to go through the usual process of naming\registering and enabling them in Fabric\Inventory\Fabric Membership.
IPN Validation
In part 1 of this series we configured the IPN and went through some validation. The validation in part 1 is for basic IP connectivity with OSPF which should be validated along with the steps in the ‘Inter-POD Routing’ section below. Once this is successful move back to the part 1 validation and go through the multicast checks on the IPN, basic IP connectivity needs to be in place first end to end before multicast can be considered.
Inter-POD Routing
We first want to make sure we have OSPF peering working successfully and receiving the routes for the other pod’s spine switches. Whilst running these commands remember that the multi-pod and all infra routing occurs within the overlay-1 vrf\ctx in the infra tenant, so ensure the commands have the ‘vrf overlay-1’ parameter otherwise you will be running the commands against the global context in which you wont find what you are looking for.
To check the OSPF peering with the IPN devices, as we would on any route we check for OSPF neighbor states;
103# show ip ospf neighbors vrf overlay-1
OSPF Process ID default VRF overlay-1
Total number of neighbors: 2
Neighbor ID Pri State Up Time Address Interface
10.96.2.2 1 FULL/ - 1w3d 10.96.2.241 Eth1/32.47
10.96.2.1 1 FULL/ - 2w6d 10.96.2.253 Eth1/36.36
Here we have the two IPN devices in a full p2p state, we know its p2p as there is no DR/BDR/OTHER role shown in the FULL state. Next lets check the route table for POD 1 spine 101 device loopback (we would check for all other devices as well but we are keeping this readable by keeping it short but the same process and prinicples apply).
103# show ip route 10.96.1.3 vrf overlay-1
IP Route Table for VRF overlay-1
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'% string' in via output denotes VRF string
10.96.1.3/32, ubest/mbest: 2/0
*via 10.96.2.253, eth1/36.36, [110/7], 01w10d, ospf-default, intra
*via 10.96.2.241, eth1/32.47, [110/7], 01w10d, ospf-default, intra
via 10.102.8.64, eth1/1.37, [115/65], 02w20d, isis-isis_infra, L1
via 10.2.112.66, eth1/2.39, [115/65], 02w20d, isis-isis_infra, L1
via 10.2.112.67, eth1/3.40, [115/65], 02w20d, isis-isis_infra, L1
via 10.2.8.65, eth1/4.38, [115/65], 02w20d, isis-isis_infra, L1
via 10.2.8.66, eth1/6.46, [115/65], 02w20d, isis-isis_infra, L1
via 10.2.112.68, eth1/5.44, [115/65], 02w20d, isis-isis_infra, L1
Looking at the output we see we have two OSPF routes installed as best paths from the two connected IPN devices as in the previous output for the first IPN connected spine in POD 1, we would check for the second spine in POD2 connected to the IPN and any other remote spines / pods connected to the IPN too to ensure full reach-ability between all IPN connected spine nodes. Check this by sending a ping from this spine in POD2 to the the spine in POD 1.
103# iping -V overlay-1 10.96.1.3
PING 10.96.1.3 (10.96.1.3) from 100.96.2.242: 56 data bytes
64 bytes from 10.96.1.3: icmp_seq=0 ttl=62 time=1.027 ms
64 bytes from 10.96.1.3: icmp_seq=1 ttl=62 time=0.752 ms
64 bytes from 10.96.1.3: icmp_seq=2 ttl=62 time=0.76 ms
Now we are satisfied we have IP reach-ability between all the IPN connected spine nodes. We need to now validate BGP. Firstly lets look at the neighbor states. We should see iBGP sessions from this spine 103 in pod 2 to all other IPN connected spines in other pods – we will not (and should not) see any BGP sessions to the other local spines in the same pod, this is not required.
103# show bgp l2vpn evpn summary vrf all
BGP summary information for VRF overlay-1, address family L2VPN EVPN
BGP router identifier 10.96.2.3, local AS number 65001
BGP table version is 56461, L2VPN EVPN config peers 2, capable peers 2
164 network entries and 174 paths using 32144 bytes of memory
BGP attribute entries [3/432], BGP AS path entries [0/0]
BGP community entries [0/0], BGP clusterlist entries [0/0]
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
10.96.1.3 4 65001 29531 71004 56461 0 0 2w6d 10
10.96.1.4 4 65001 29533 71004 56461 0 0 2w6d 10
We see in this output that this spine node (pod2\s103) has two iBGP neighbors which are the spine nodes in POD1 connected to the IPN and these are stable as we can see we have an up time of 2w6d on these peerings.
We will now check that we have the POD TEP address in the route table. If we recall from the APIC configuration (Pt.2) that we provided a TEP address for an entire POD in the multi-site profile in element (MO) . This TEP address is what the VXLAN packets are destined for when destined for a different POD. The leaf switches in the source pod do not need to what the destination leaf is in the destination pod (where src and dst pods are different), the local leaf (or spine) switch just needs to the the packet to destination pod. Once at the destination pod, the spine switches (more correctly the local hardware proxy/COOP) can figure out which leaf switch in the local pod the packer needs to go to. What this means is that any pod switch needs the have the route to the destination pod TEP address. Checking on a spine and a leaf in pod 2 for the pod 1 TEP address as follows:
A spine switch in POD2.
103# show ip route 10.96.1.96 vrf overlay-1
IP Route Table for VRF "overlay-1"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
100.96.1.96/32, ubest/mbest: 2/0
*via 10.96.2.253, eth1/36.36, [110/20], 01w10d, ospf-default, type-2
*via 10.96.2.241, eth1/32.47, [110/20], 01w10d, ospf-default, type-2
via 10.2.8.64, eth1/1.37, [115/65], 02w20d, isis-isis_infra, L1
via 10.2.112.66, eth1/2.39, [115/65], 02w20d, isis-isis_infra, L1
via 10.2.112.67, eth1/3.40, [115/65], 02w20d, isis-isis_infra, L1
via 10.2.8.65, eth1/4.38, [115/65], 02w20d, isis-isis_infra, L1
via 10.2.8.66, eth1/6.46, [115/65], 02w20d, isis-isis_infra, L1
via 10.2.112.68, eth1/5.44, [115/65], 02w20d, isis-isis_infra, L1
A leaf switch in POD2.
206# show ip route 10.96.1.96 vrf overlay-1
IP Route Table for VRF "overlay-1"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
100.96.1.96/32, ubest/mbest: 2/0
*via 10.2.112.64, eth1/49.2, [115/64], 02w20d, isis-isis_infra, L1
*via 10.2.112.65, eth1/50.1, [115/64], 02w20d, isis-isis_infra, L1
We can check the same from pod 1, checking for pod 2 TEP address of 10.96.2.96 as per the multi-site configuration in Pt.2 of this series. At this stage we can check the multicast group configuration which is in Pt.1 of this series along with the IPN configuration. To have multicast working ,we need to have this basic IP connectivity working first but need the IPN up and running for this. When you have validated IP connectivity on the IPN with OSFP peerings and have the above checks validated, then move on to the multicast validation as in Pt.1.
also worth mentioning the new APIC(s) must be on the same version, else the new member(s) will end up in a diverged state and you will have to reimage it via CIMC onto the correct version.
in case you don’t see your leaves/spines popping up in Fabric membership you might want to restart the dhcpd, which is due to CSCvf12024
Impressive good job in describing the implementation of ACI multipod