An Architect’s view on Network Virtualization – Part II

This is a two-part post – Part 1 introduces the network concepts associated with MidoNet architecture and discusses the concepts of complexity to network edges. Part 2 (this post) describes distributed databases, overlay model and traffic flow.

Distributed Databases

Control plane state is all about how much information a protocol needs to manage. State can be measured through the number of reachable destinations and the rate at which the information carried in the control plane changes. Generally speaking, more control plane information means more processor and memory utilization. If there are policies tied to these destinations, additional processing power is needed. A good network design will always try to hide the rate of change in a network by limiting the scope of the change. In the traditional networking sense, this was done through information hiding – route aggregation or even route reflection.  

Midokura’s distributed database model follows a subscription service where control plane traffic (Address Resolution Protocol and Media Access Control Address) are only known to devices that need to know them. Their database infrastructure consists of Cassandra and Zookeeper. Zookeeper stores permanent configurations and Cassandra stores volatile data as a backup mechanism.

Zookeeper is a mandatory, centralised component used to store network configuration information i.e. the network topology. This may include items such as what ports are connected to what virtual bridges. It also stores some control plane ARP and MAC address information. The rate of change for ARP and MAC address is pretty low and doesn’t consume much resources on Zookeeper. The information is then distributed across the network and agents can subscribe to listen to certain ARP and MAC updates / changes. Zookeeper serves the virtual network topology and employs a subscription service to topology changes. If an agent is not interested in certain traffic, it simply does not subscribe.

Cassandra is used to back up flow state and connection tracking / NAT tables for the virtual devices. Most agents can function normally without Cassandra due to a local cached copy. Cassandra is generally used as backup. We send data to Cassandra and read from it during backup scenarios.

State information is sent peer to peer between two hosts via User Datagram Protocol packets and a specific tunnel key. Essentially this means that the MidoNet agents only go to Cassandra when is has no state information pertaining to a flow (e.g. the host/agent reboots or the port migrated from another host). The flow state is distributed and the agents share it peer to peer. Sharing flow state enables asymmetric traffic flows across multiple gateways and gateway node fault-tolerance.

The screenshot above shows the midonet-cli output of two ports in a single port-group.

1BLG-image04

As of MEM v1.6, the distributed flow state improvement was implemented such that Cassandra is only required as a backup. This improves the efficiency for an agent to recognize the flow’s state and thus reduce latency of flow computation given the state is local, as well as removes the dependency on the database.

 

Physical Underlay

The physical underlay provides transport for the software overlay that sits on top. Its only concern is with IP endpoint connectivity. Less complexity means a more stable network core. This is a foundation principle of Midokura philosophy – keep the core (underlay) simple and push the complexity to software (using agents and overlays) at the edges, where it can be easily managed.

The physical underlay may take whatever physical design you want. It may be the traditional Core, Aggregation, Access or CLOS model. Midokura recommend a CLOS design as each endpoint has equidistant connectivity meaning you have the same bandwidth between any two ports. Obviously, not between local connected ports as they would have full bandwidth, which would be switched at line rate.  

Enabling an IP core underlay eliminates any stretch Layer 2 segments. Layer 2 connectivity can now be accomplished via the software overlays initiated at the edge. The problems with large Layer 2 domains and stretched Layer 2 segments is they always represent a single failure domain. You increase failure domains by pulling Layer 2 islands together.

The diagram below displays the physical to logical overlay correlation.

1BLG-image03

Overlay

Virtualization allows you to build vertically with overlays, rather than horizontally with larger physical networks. The overlay model creates virtual topologies that are independent to each other, acting as complete islands of their own. The process of creating overlays on top of a physical underlay can be described as building a “second floor”. The overlay (second floor) is software based and is where all the magic happens. Midokura supports both GRE or VXLAN encapsulations.

There are always two control planes with the overlay model. There is a control plane that provides reachability through the virtual topology and a control plane that provides reachability to the tunnel endpoints. Initially, this may seem like added complexity as the control plane of the virtual topology relies heavily on the control plane of the underlay. While, this is true there are steps to take to make the underlay rock solid and correctly correlates with the overlay.

Internal and External Traffic Flow

Previously mentioned, the MidoNet agents subscribes for tailored updates from Zookeeper. The agent becomes subscribed to changes with only those elements in the path, for example, virtual ports. The agent maintains a local cache and it gets notifications from Zookeeper when those elements change properties. Almost 99% of the time, the MidoNet doesn’t need to request information from Zookeeper.

When the MidoNet agent receives a packet on a virtual port, it needs to make a decision for that packet. It initiates a simulation of the network path for that packet. If the packet is coming from a port that belongs to a virtual bridge, it looks up the MAC address table or if the packet is coming from a virtual router it looks at the routing table. Load-balancing and firewall rules are also determined during this stage.

Once the simulation is complete the packet can be sent to its destination. The agent informs the Linux kernel if it sees a packet with those headers – Layer 2, Layer 3 or Layer 4 headers, send directly at kernel line rate and don’t involve the MidoNet Agent sitting in userspace.

The diagram below displays the steps involved with packets traversing the virtual network.

1BLG-image00

Example:

  1. VM1 sends a packet through the virtual network to VM2, located on a different hypervisor. The packet hits the kernel of the hypervisor and the hypervisor performs a lookup in its flow table to see if there is an existing route for that packet. For this example, the packet is the first in the flow so no entry exists in the Linux kernel. The result of the flow miss is the packet getting passed up to the MidoNet Agent running in user space.
  2. The MidoNet Agent then grabs the network topology information from the network database in Zookeeper. As discussed, the MidoNet Agent and Zookeeper are in a subscriber model. The agent usually has a local copy of the database and rarely goes off box to get network topology information.
  3. Once the MidoNet Agent has the topology information, it does a logical simulation of the packet moving across the network. Once the simulation is complete, a number of GRE or VXLAN tunnel headers are created.
  4. These tunnels then get added to the kernel flow table enabling the packet to be passed in the tunnel to its intended destination. All subsequent packets to the same destination no longer interact with the MidoNet agent and pass directly via the kernel module.
  5. Packets get tunnelled to the egress host.

Traffic to external destinations follow a similar process. VM1 is trying to send a packet to an external destination. Similar to before, the packet hits the kernel. The kernel realises it has no route information for the flow so it passes it up to the MidoNet Agent running in userspace. The agent runs its simulation and determines that this network is off net. This results in the creation of tunnels header but this time to the border gateway node. Similar to internal traffic flow, once the tunnels are created they get added to the kernel flow table. All subsequent packets pass via the kernel with no user space interaction.

Traffic sourcing from the Internet follows a similar process. Incoming packets destinated to an internal device hit the kernel on one of the MidoNet gateways. If no flow exists for that destination, the MidoNet agent queries the network state database and runs its network simulation. Similar to before, this creates tunnels between the border gateway nodes and their final destination compute hosts.

 

… Read Part I of this Article

Matt Conran

About Matt Conran

Matt Conran is an independent consultant, blogger, and the publisher of network-insight.net. He is a 16 year veteran in the networking industry with core skillsets in Data Centre, WAN, MPLS, security and virtualization technologies. He has implemented multiple infrastructure projects with startups, governments, enterprises, and service provider customers, including being a lead Network Architect in major global greenfield deployments. He loves to travel and has a passion for landscape photography.

One Thought on “An Architect’s view on Network Virtualization – Part II

  1. Pingback: Midokura - Distributed Design - Technology Focused Hub

Post Navigation