Data Center

VXLAN Revisited – CSR1KV Lab

So a while back I did a video on labing out VXLAN within GNS3, using my good friend the CSR1000v. I got a fair number of views on the video (for me), but one person in the comments pointed out… I’d never actually done a write up on this topic. Making it slightly more difficult for people to follow along.

Hence, this post. So I’m going to jump right into the configuration of VXLAN, but I’m going to foolishly assume you have a working unicast routing topology before we start. I’ll be using single area IS-IS routing, but you can use literally whatever routing protocol you want. You just need full reach between all loopback interfaces. Here’s my topology:
The interfaces facing southbound from the CSRs to the servers are unaddressed, and they’re going to remain that way. To give a little insight, here’s the configuration on CSR1, along with a show ip route isis.

router isis 1
 net 00.0000.0000.0011.00
 is-type level-1
 log-adjacency-changes
 passive-interface Loopback0
!
interface Loopback0
 description Loopback
 ip address 10.11.11.11 255.255.255.255
!
interface GigabitEthernet2
 description to SPINE1
 ip address 10.1.11.11 255.255.255.0
 ip router isis 1
!
interface GigabitEthernet3
 description to SPINE2
 ip address 10.2.12.11 255.255.255.0
 ip router isis 1
!
interface GigabitEthernet4
 description to SPINE3
 ip address 10.3.13.11 255.255.255.0
 ip router isis 1
!
interface GigabitEthernet5
 description to POD1-SW
 no ip address
!
!
i L1     10.1.1.1/32 [115/10] via 10.1.11.1, 00:00:18, GigabitEthernet2
i L1     10.1.21.0/24 [115/20] via 10.1.11.1, 00:00:18, GigabitEthernet2
i L1     10.1.31.0/24 [115/20] via 10.1.11.1, 00:00:18, GigabitEthernet2
i L1     10.1.41.0/24 [115/20] via 10.1.11.1, 00:00:18, GigabitEthernet2
i L1     10.2.2.2/32 [115/10] via 10.2.12.2, 00:00:40, GigabitEthernet3
i L1     10.2.22.0/24 [115/20] via 10.2.12.2, 00:00:40, GigabitEthernet3
i L1     10.2.32.0/24 [115/20] via 10.2.12.2, 00:00:40, GigabitEthernet3
i L1     10.2.42.0/24 [115/20] via 10.2.12.2, 00:00:40, GigabitEthernet3
i L1     10.3.3.3/32 [115/10] via 10.3.13.3, 13:27:57, GigabitEthernet4
i L1     10.3.23.0/24 [115/20] via 10.3.13.3, 13:27:57, GigabitEthernet4
i L1     10.3.33.0/24 [115/20] via 10.3.13.3, 13:27:57, GigabitEthernet4
i L1     10.3.43.0/24 [115/20] via 10.3.13.3, 13:27:57, GigabitEthernet4
i L1     10.12.12.12/32 [115/20] via 10.3.13.3, 00:00:18, GigabitEthernet4
                        [115/20] via 10.2.12.2, 00:00:18, GigabitEthernet3
                        [115/20] via 10.1.11.1, 00:00:18, GigabitEthernet2
i L1     10.13.13.13/32 [115/20] via 10.3.13.3, 00:00:18, GigabitEthernet4
                        [115/20] via 10.2.12.2, 00:00:18, GigabitEthernet3
                        [115/20] via 10.1.11.1, 00:00:18, GigabitEthernet2
i L1     10.14.14.14/32 [115/20] via 10.3.13.3, 00:00:18, GigabitEthernet4
                        [115/20] via 10.2.12.2, 00:00:18, GigabitEthernet3
                        [115/20] via 10.1.11.1, 00:00:18, GigabitEthernet2





Alright so we have unicast routing in place, before we start working on our VXLAN configuration, we’re also going to need functional multicast routing. Now, you can use static RP, autorp, or bsr however you have to enable bidirectional PIM. Why bidirectional PIM you ask? Well, the short answer is that bidir pim was created to to answer a short coming of traditional multicast. Traditional multicast operates on the idea that there are many many more receivers than sources. However, in VXLAN we have our VTEPs are acting as both receivers and sources (a little more on this at the end). In this example I’ll be using BSR to announce Spine1 as my RP. For brevity, I’ll provide the configuration of only Spine1 and CSR1, however if you’re testing this out in your lab please realize you’ll want PIM sparse mode enabled on all your interfaces except the interface facing your clients/servers. You’ll also want bi-directional pim enabled on all devices.

SPINE1#show run | i ip pim|interface
interface Loopback0
 ip pim sparse-mode
interface GigabitEthernet0/1
 ip pim sparse-mode
interface GigabitEthernet0/2
 ip pim sparse-mode
interface GigabitEthernet0/3
 ip pim sparse-mode
interface GigabitEthernet0/4
 ip pim sparse-mode
ip multicast-routing
ip pim bidir-enable
ip pim bsr-candidate Loopback0 0
ip pim rp-candidate Loopback0 group-list GROUP1-MCAST bidir <– NOTICE we’re announcing this RP as a bidir RP
!
!

SPINE1#show ip access-list GROUP1-MCAST
Standard IP access list GROUP1-MCAST
    10 permit 239.0.0.0, wildcard bits 0.255.255.255


##


CSR1#show run | i interface|ip pim
interface Loopback0
 ip pim sparse-mode
interface GigabitEthernet2
 ip pim sparse-mode
interface GigabitEthernet3
 ip pim sparse-mode
interface GigabitEthernet4
 ip pim sparse-mode
interface GigabitEthernet5
 ##This interface connects down to client, it’s L2 only, hence no PIM configuration.
ip pim bidir-enable
ip multicast-routing distributed




So a couple key points there, we’re enabling bi-directional pim globally with “ip pim bidir-enable” on all devices. Then on the RP, I’m telling Spine1 to announce itself as not only an RP candidate, but an RP that is supporting bi-directional PIM. I’m also filtering the groups this RP is responsible for with ACL “GROUP1-MCAST”… because I felt like it lol. The next thing I like to do is very multicast routing is working as expected. So, I’ll go to CSR4, have it join an mcast group and do a simple ping test from CSR1.


CSR4(config)#int lo0
CSR4(config-if)# ip igmp join-group 239.0.0.4
!
!

### From CSR1 ###


CSR1#ping 239.0.0.4 time 1  rep 3
Type escape sequence to abort.
Sending 3, 100-byte ICMP Echos to 239.0.0.4, timeout is 1 seconds:


Reply to request 0 from 10.14.14.14, 17 ms
Reply to request 0 from 10.14.14.14, 17 ms
Reply to request 0 from 10.14.14.14, 17 ms
Reply to request 0 from 10.14.14.14, 17 ms
Reply to request 1 from 10.14.14.14, 25 ms
Reply to request 1 from 10.14.14.14, 25 ms
Reply to request 1 from 10.14.14.14, 25 ms
Reply to request 1 from 10.14.14.14, 25 ms
Reply to request 2 from 10.14.14.14, 27 ms
Reply to request 2 from 10.14.14.14, 31 ms
Reply to request 2 from 10.14.14.14, 31 ms
Reply to request 2 from 10.14.14.14, 31 ms




Perfect! Just to keep things clean, I removed that join from CSR4. Now onto the actual VXLAN configuration. With functional unicast and multicast routing, we only have a couple extra ingredients to make this work.


1. Network Virtualization Endpoint (NVE) Interface
2. Service Instance
3. Bridge-Domain (to tie it all together).


Luckily, the configuration is so generic, we can just copy and paste it to all our Virtual Tunnel Endpoints (VTEPs). I know… I know I’m defining every acronym lol. I don’t care, I like knowing what all the acronyms actually stand for. Alright! So here’s a very basic configuration I’ll drop on my VTEP to create a single bridge-domain with a single VXLAN Network Identifier (VNI). This is the equivalent of configuring a single VLAN on your network. Just way cooler since connections between VTEPs are all layer 3, and hence get the benefit of ECMP.


int nve 1
 source-interface lo0
 member vni 47884 mcast-group 239.0.12.34
!
interface GigabitEthernet5 
 service instance 1 ethernet
  encapsulation untagged
  exit
!
bridge-domain 1
 member vni 47884
 member GigabitEthernet5 service-instance 1

So that bit of configuration is basically saying “Any untagged traffic received on Gig5 is part of service instance 1. Service instance 1 is part of bridge-domain 1, as is VNI 47884. VNI 47884 is using multicast-group 239.0.12.34.” Now, after apply this configuration to all our VTEPs (the CSR1Kvs for this lab), we can doing a couple test pings from our servers. 


cisco@server-1:~$ ip add sh eth1
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether fa:16:3e:08:a7:4a brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.1/24 brd 192.168.0.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe08:a74a/64 scope link 
       valid_lft forever preferred_lft forever
cisco@server-1:~$ ping 192.168.0.2 -c 3
PING 192.168.0.2 (192.168.0.2) 56(84) bytes of data.
64 bytes from 192.168.0.2: icmp_seq=1 ttl=64 time=8.61 ms
64 bytes from 192.168.0.2: icmp_seq=2 ttl=64 time=6.29 ms
64 bytes from 192.168.0.2: icmp_seq=3 ttl=64 time=5.36 ms

— 192.168.0.2 ping statistics —
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 5.366/6.757/8.616/1.368 ms
cisco@server-1:~$ ping 192.168.0.3 -c 3 
PING 192.168.0.3 (192.168.0.3) 56(84) bytes of data.
64 bytes from 192.168.0.3: icmp_seq=1 ttl=64 time=5.38 ms
64 bytes from 192.168.0.3: icmp_seq=2 ttl=64 time=5.21 ms
64 bytes from 192.168.0.3: icmp_seq=3 ttl=64 time=4.34 ms

— 192.168.0.3 ping statistics —
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 4.340/4.981/5.389/0.462 ms
cisco@server-1:~$ ping 192.168.0.4 -c 3 
PING 192.168.0.4 (192.168.0.4) 56(84) bytes of data.
64 bytes from 192.168.0.4: icmp_seq=1 ttl=64 time=5.83 ms
64 bytes from 192.168.0.4: icmp_seq=2 ttl=64 time=5.44 ms
64 bytes from 192.168.0.4: icmp_seq=3 ttl=64 time=7.29 ms

— 192.168.0.4 ping statistics —
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 5.441/6.191/7.296/0.797 ms

Also, before we continue our discussion and I finally tell you WHY we have to use multicast, let’s take a look bridge-domain 1 on CSR1 and see what it looks like.

CSR1#show bridge-domain 1
Bridge-domain 1 (2 ports in all)
State: UP                    Mac learning: Enabled
Aging-Timer: 300 second(s)
    GigabitEthernet5 service instance 1
    vni 47884
   AED MAC address    Policy  Tag       Age  Pseudoport
   0   FA16.3E0A.1FD8 forward dynamic   214  nve1.VNI47884, VxLAN 
                                             src: 10.11.11.11 dst: 10.12.12.12
   0   FA16.3E84.952A forward dynamic   219  nve1.VNI47884, VxLAN 
                                             src: 10.11.11.11 dst: 10.13.13.13
   0   FA16.3EA9.7EDF forward dynamic   223  nve1.VNI47884, VxLAN 
                                             src: 10.11.11.11 dst: 10.14.14.14
   0   FA16.3E08.A74A forward dynamic   223  GigabitEthernet5.EFP1
So bridge-domain 1 is storing mac-addresses for local and remote hosts. Notice we’re mapping Server2 – 4 mac addresses to not just a VNI, but we also store the other VTEP’s loopbacks as part of that mapping (look at src: 10.11.11.11 dst: 10.xx.xx.xx). How did we learn that information???



Well to get a better view of this, I shutdown interfaces Gi3 – 4 on CSR1 forcing all traffic through Gi2 (connected to Spine1). I also cleared the mac address table on the bridge-domain (clear bridge-domain 1 mac table) and cleared arp entries from the host. Then I setup up a packet capture on Gi2, and re-ran my ping from server 1 (192.168.0.1) to server 2 (192.168.0.2). Check out this capture, see if you can wrap your mind around it before I attempt to explain.



So, if you’re reading carefully you’ll notice that ARP request is actually being sent to the multicast address of 239.0.12.34. Which is exactly the reason we’re using multicast to support VXLAN. Multicast is used for all unknown unicast and broadcast traffic. With the case of ARP, the VTEP will actually learn which other VTEP has that host connected in the exact same fashion a switch learns mac addresses. From then on when server 1 sends frames to server 2, communication is unicast between CSR1 and CSR2. **A quick additional note: You can also see that VXLAN is using UDP encapsulation. Just incase you ever hear it referred to as mac-in-udp routing.**


So in summary, VXLAN not only gives us over 16 million segments (compared to 4094 if you used VLANs only), all the benefits of having a routed network (much more scalable, and support of ECMP), but it also carries over traditional ethernet concepts with it. I’ll endeavor to do a follow up post in the future to look at running multiple VNIs. Until then, happy networking you packet pushers you. 

0 Comments on “VXLAN Revisited – CSR1KV Lab

  1. Nice work man. Thanks for taking the time to write this post. I am trying to do the same thing but it seems that the CSR image I download, does not have the VTEP capabilities. I am not being able to create the NVE interfaces (no option), when I create the bridge domain, I am then not able to add members (vni nor local interfaces).
    I downloaded the csr1000v-universalk9.03.11.02.S.154-1.S2-std.iso. Do I need to download a different ISO, or is just that I need to activate some VXLAN features set or license?

    Thanks!!!

  2. One thing to make sure you do, which I do by default, is make sure that license level is using the "ax" or "premium" license to boot with. I have unicast and multicast VXLAN working in my lab, I used XE 3.14 since that is the version listed for SPv4. I was able to download 3.14 code, your mileage may vary.

Leave a Reply