Understand HUB Cluster Design in VMware SD-WAN

 


In the VMware SD-WAN solution, clustering is a feature to logically group multiple Active/Active SD-WAN Hubs in the same location so they can operate as one (logical) unit when configured as the Hub for branch Edges.

The two main purposes of Hub clustering are

  •  horizontal scaling

        Increase bandwidth or tunnel requirements at a Hub location beyond the capability of a single Hub Edge, by using up to 8 Hubs in parallel.

  •  redundancy

        Provide redundancy for a Hub location to minimize single point of failure by deploying N+1 Hubs in a cluster.

 

 


  In SD Wan node design a basic distinction is made between


IN-PATH and OFF-PATH design

While IN-PATH means that by topology all traffic automatically will pass the SD Wan Edge (or Hub).

For OFF-PATH Design a System Administrator has to make sure that LAN to WAN, Lan to Overlay and vice versa traffic has to pass the SD-Wan Edge by configuring the surrounding L2 and L3 Devices (Switches and Routers) and separately taking care about the Control-Plane requirements and protocols and the Data-plane forwarding.

 

There are some Reference Guides from VMware or even written by Velocloud (the company who has developed the SD Wan solution and was bought by VMware).

 But even those are rather difficult to understand and follow as they typically tend to show the total picture instead of separating the different requirements for configuration.

 

 One basic concept needed to understand Cluster design is explained here.

A single Branch Edge will only establish Overlay tunnels to one of the Hubs in the Cluster.

In that way Hubs in the Cluster need to exchange Routing Information between each other and also forward packets, when for Branch-to-Branch communication between Edges via Hub, the Hub side connection is terminated on different nodes.

It is also important to mention, that different to High Availability each node in a Hub Cluster has a separate complete LAN, WAN, routing and Overlay configuration.

 

So in order to better understand Cluster and the necessary design and configuration it is a good idea to separate the different logical functions as good as possible and look at each of that functions separately.

In that way we first ignore the necessary additional redundancy of the Core Layer-3 Switches

Additionally we separate LAN, Internet and MPLS traffic and the necessary connections and protocols.

So we have now 3 areas to separately look into:

 

        LAN-Area
L3 device R1, HUB Edges LAN interface (GE2)

        MPLS-Area
L2/3 devices R2 and R4, HUB Edges WAN interface (GE4)

        Internet-Area
L2/3 device R3 and Internet Firewall R5, HUB Edges WAN interface (GE3)

 

 

In order to test different SD Wan features I use a Vsphere hosted EVE-NG pro environment with an Velocloud Orchestrator (VCO) implemented as a separate VM.

All the necessary edges, gateways and devices are inside this lab.

 For testing  I have built up a HUB Cluster consisting of 3 vEdges with Hybrid connectivity (Internet and MPLS). Internet and MPLS/vrf is emulated by using a single Cisco IOS Router each.

  •  MPLS uses BGP as routing protocol.
  • The Internet connectivity uses fixed IP addresses and a Firewall implementing also stateful 1:1 NAT from Public to Private addresses.

 Instead of using 3 separate interfaces for the HUB connectivity (GE2,GE3 and GE4) a single 802.1Q trunk with 3 Vlans would also be possible.

 NOTE: for better understanding of the basic principles I did not implement advanced route tagging (using BGP communities). In a real world scenario you should always tag routes to be able to distinguish routes between Underlay, Overlay, NSD-routes and so on.

The LAN-Area
consist of L3 device R1, HUB Edges LAN interface (GE2) and the necessary exchange of routing information (recommended external BGP (eBGP)).

  • For LAN connectivity I use a BVI or IRB interface as gateway address for my LAN devices.
  • LAN prefix will be announced by using the network statement.
  • Control plane connectivity is established via eBGP over the LAN interface of the HUB devices.

The necessary BGP configuration has to be done per Hub separately, in my case we have 2 BGP connections from each Hub.

 

    LAN connection to R1

    MPLS (WAN) connection to the MPLS/VPN PE router

Note that the MPLS neighbor config towards MPLS has the uplink flag set, thus preventing the redistribution of the MPLS routes to the SD Wan Overlay. This is okay in my case as I am not using Hubs for Edge-to Edge communication and use Partner Gateways instead.

Shown here is the BGP configuration and the verification via Monitor/Routing/BGP Edge Neighbor State (using the new Orchestrator UI)

It is equally visible via Monitor/Network Services in the old UI

Here is the BGP routing table as seen by the Remote Diagnostics/List BGP Routes feature.

 NOTE: Routes learned via GE4 (MPLS WAN connectivity) are marked as EU (eBGP with Uplink Feature set)

 

The MPLS-Area consists of L2/3 devices R2 and R4, HUB Edges WAN interface (GE4)

 As the MPLS environment is a private vrf, NAT is not needed but we need to learn routes from the MPLS/VPN PE router via BGP.

 The easiest way to achieve necessary control plane and data plane connectivity is by using direct Layer-2 connections. 

Thus R2 and R4 are layer 2 bridge-groups implemented on Cisco IOS Router.

 So for R4 I did a show bridge-group 1, for R2 you see the necessary configuration parts in the above picture

 

In the Internet-Area device R3 simply connects the Hubs GE3 Interfaces to the Internet Firewall R5, thus again is a pure Layer-2 bridge-group implementation.

In the Internet-Area the Internet Firewall R5 allows controlled connectivity to the Public Internet environment.

As I use internally private addresses on my Hubs the firewall need to do a static1:1 translation of that private addresses to public ones and vice versa.

10.10.12.0/29 <-> 110.1.1.0/29

We are using only addresses 1-4 for possible Hubs, address .5 is used for the Firewall interfaces inside and outside.

 Therefore the proxy-arp settings only specify /30 (addresses .0 to .4). On SRX the proxy-arp statement does not allow to specify /29 as it also contains the interface address (.5)


Here the necessary security policies are shown which allow incoming to outgoing communication without restrictions and the configured VCMP application to also be allowed from outside to inside.

root@BR-1-FW-Int> show configuration applications
application VCMP {
    protocol udp;
    source-port 2426;
    destination-port 2426;
}

The firewall has the internet facing interface belonging to the untrust zone and the internal interface in the trust zone.

We also defined a static default route to the internet on the Firewall and a static route to the local DC-LAN for direct internet traffic as well as a SNAT with PAT for outgoing traffic from LAN to the Internet via Underlay (both not shown here).

 

 

This concludes my attempt of explaining the basic mechanism, procedures and protocols needed to make a VMware SD Wan Hub Cluster to work.

Feel free to comment on that elaboration.



Comments

Popular posts from this blog

Orchestrator Upgrade to Version 5.2

Deep Dive on DMPO and its Performance Features (available and missing) Part 1

Deep Dive on DMPO and its Performance Features (available and missing) Part 2