Open vSwitch and OVN 2020 Fall Conference
The conference was held online Dec. 8 and 9.
|The Discrepancy of the Megaflow Cache in OVS, Final Episode (Levente Csikor, National University of Singapore; Vipul Ujawanae, IIT Kharagpur; and Dinil Mon Divakaran, Trustwave)||video|
|Container Networking solutions with OVS (Anjali S Jain and Nupur Jain, Intel)||PPTX | PDF||video|
|Transparent Validation (Hariprasad Govindharajan, Intel)||PPTX||video|
|OvS Offload: Too Many Models? (Hemal Shah, Broadcom)||video|
|OvS Offload Layer Design Challenges (Hemal Shah and Sriharsha Basavapatna, Broadcom)||video|
|Next steps for even higher performance of the SW Datapath in OVS (Harry van Haaren, Intel)||video|
|Port Mirroring Offload (Timothy Miskell, Larry Wang, and Emma Finn, Intel and Munish Mehan, AT&T)||PPTX||video|
|OVS Benchmark (Guy Twig and Roni Bar Yanai, nVidia)||PPTX||video|
|Debugging OVSDB with stream record/replay. (Ilya Maximets, Red Hat)||video|
|Community updates and new LTS process (Ilya Maximets, Red Hat)||video|
|DPDK Meson build for OVS (Sunil Pai G, Intel)||video|
|Growing Pains - DPDKs move to API/ABI Stability and its effect on OVS (Ian Stokes, Intel)||PPTX||video|
|Enabling asynchronous Para-virtual I/O in OVS (Sunil Pai G, Intel)||video|
|OVS DPDK VXLAN & VLAN TSO, GRO and GSO Implementation and Status Update (Yi Yang, Inspur)||PPTX||video|
|Hassle-free migration of OpenStack cloud to OVN (Frode Nordahl, Canonical)||video|
|Incremental processing improvements in ovn-controller in 20.06 and 20.09 (Numan Siddique, Red Hat)||video|
|OVN with EVPN (Ankur Kumar Sharma and Greg A Smith, Nutanix)||video|
|Integrating Open vSwitch with Netplan.io (Lukas Märdian, Canonical)||online||video|
|Converting OpenFlow to P4 (James Choi, Intel)||PPTX||video|
|p4proto: Cooking OVS with P4 Spice! (Namrata Limaye and Deb Chatterjee, Intel)||PPTX||video|
|vswitch.p4: Why OVS Needs P4 (Dan Daly, Intel)||PPTX||video|
The Discrepancy of the Megaflow Cache in OVS, Final Episode (Levente Csikor, National University of Singapore; Vipul Ujawanae, IIT Kharagpur; and Dinil Mon Divakaran, Trustwave)
In the previous talks, we demonstrated that the Tuple Space Search (TSS) scheme, used for packet classification algorithm in the MegaFlow Cache (MFC) of OVS, has an algorithmic deficiency that can be abused by an attacker in different ways by pushing this generally high performing packet classifier to its corner case of degraded performance. We called this attack as Tuple Space Explosion (TSE). In TSE, a legitimately looking low-rate attack traffic (with no particular pattern) inflates the tuple space making the cardinal linear search process in TSS to spend an unaffordable time for classifying each packet; this eventually leads to a complete denial-of-service (DoS) for the users .
In the first part , we focused on a limited attack scenario. In particular, we demonstrated that for each set of flow rules, e.g., Access Control Lists (ACL), there exists a well-engineered traffic trace from which almost every packet creates a new tuple. We showed that the basic Whitelist+DefaultDeny ACLs tenants are typically given as default in cloud systems are particularly vulnerable. However, in order to carry out this attack, the adversary has to have access to or knowledge of the installed ACLs.
In Part II , we analyzed that when the attacker is not aware of the ACL, to what extent a randomized traffic trace can inflate the tuple space. Particularly, we showed that with less than 7 Mbps attack rate, significant degradation of 88% could be achieved.
Both works above, however, had one crucial aspect in common. We focused on one type of datapath, exclusively, namely the kernel datapath installed by the underlying system's own packet manager. In many real-world (production) environments, administrators simply rely on the built-in software tools to install applications to reduce or even completely avoid all the crux around manual installations and compilations from source code, e.g., via apt-get install openvswitch-common in Debian-based Linux distributions. Even though in most of the cases, we eventually end up having the same application with negligible (performance) difference, when applications also have modules supplied by the underlying kernel (e.g., in the case of Open vSwitch since the Linux 3.3 kernel debut in 2012 ), there can be significant deviations among the implementations. In particular, as it turned out after the discussions (with some of the OVS developers) during our previous talks, (i) the kernel networking stack developers do not prefer exact flow caching; therefore, the kernel datapath of OVS lacks the first-level Exact Match Cache (EMC). This means that the whole fast-path only comprises the MFC, thereby making TSE more efficient. On the other hand, (ii) while the userspace datapath provided by Intel's DPDK significantly improves the packet processing performance (by avoiding context-switching, interrupt-based packet handling, and the side-effects of OS schedulers), it essentially shares the same code base, and most parts of the algorithms are implemented according to the same original design.
Therefore, to round out our whole study around the discrepancy of the MFC, in this lightning talk, we investigate to what extent other datapaths are exposed to the TSE attack. First, (i) we scrutinize the kernel datapath of OVS compiled and installed from its up-to-date, “out-of-kernel-tree” source code, developed by the core OVS developers. We show that the additional caching layer of EMC can significantly increase the performance of OVS under the TSE attack, thereby requiring the adversary to increase her attack rate to become successful. Subsequently, (ii) we also analyze the Intel DPDK-based userspace datapath (also known as OVS-DPDK), which is less vulnerable to TSE due to an efficient ranking algorithm in the tuple space introduced by patch in 2016 . In essence, this ranking algorithm sorts the tuples according to the overall number of hits their entries have. Thus, whenever a packet of a frequent flow has to be classified, its corresponding tuple will be ranked higher, making the linear search process faster to find it. This renders the low-rate TSE attack much less efficient; in particular, after the attack commences, the victim flows slowly “climb back” to the front of the tuple space and their throughput resurge to higher values.
To counter this ranking algorithm, we propose TSE 2.0, which by letting some tuples expire and re-spawning them by carefully switching the original TSE attack on and off, keeps the ranking algorithm busy. Thus, eventually, TSE 2.0 causes a complete denial-of-service (DoS) for the users of the same software switch. Furthermore, we propose TSE 2.1 against OVS-DPDK running on multiple cores, wherein we slightly increase the attack rate of TSE 2.0, but, at the same time, we carefully adjust the packet sending sequence to achieve the same results as with TSE 2.0. We experimentally show that TSE 2.1 can still mount a low-rate DoS attack as long as OVS-DPDK is running on less than five cores.
 B. Bodireddy and A. Fischetti, OVS-DPDK Datapath Classifier, Intel Blogpost, https://intel.ly/3kCbIi8, October 2016 [Accessed: Oct 2020].
 L. Csikor, D. M. Divakaran, M. S. Kang, A. Korosi, B. Sonkoly, D. Haja, D. P. Pezaros, S. Schmid, and G. Rétvári, Tuple Space Explosion: A Denial-of-Service Attack Against a Software Packet Classifier In ACM CoNEXT 2019, Dec 2019.
 L. Csikor, M. S. Kang, and D. M. Divakaran, The Discrepancy of the Megaflow Cache in OVS, Part II., Full talk at OVS+OVN Conference, https://bit.ly/2SsfGh7, Dec. 2019.
 L. Csikor and G. Rétvári, The Discrepancy of the Megaflow Cache in OVS, Full talk at OVS Fall Conference, https://bit.ly/30A5qb9, Dec. 2018.
 S. M. Kerner, Open vSwitch (OVS) Becomes a Linux Foundation Collaborative Project, Aug 2016 [Accessed: Jun 2020].
Container Networking solutions with OVS (Anjali S Jain and Nupur Jain, Intel)
OVS is becoming the vSwitch of choice even for Container networking. OpenShift & Antrea are an example of that adoption. The challenges that come with that are two folds. One of exposing a container interface(Application Socket) as a switching port to OVS. Typically Application sockets don't get exposed as a forwarding end point in a vSwitch. The switching endpoint at present is a Pod. The Iptables used to load balance or direct the traffic to a Container socket is not integrated well in the vSwitch. The second challenge is of Accelerating/offloading this Container interface into HW offloaded vSwitch. This talk presents solution in this space using sub-functions/Sub-dev deployed on Ancillary bus (Kernel patches under review). There are also a few technological innovations happening in parallel in the virtualization space that help with the solution, for example scalable IOV and AF_XDP sockets. We will present the SW building blocks for achieving this makeover for OVS. We will also talk about the scale challenge and whether Netdev based Port representors are up to that challenge.
Transparent Validation (Hariprasad Govindharajan, Intel)
Contributors in the OVS community have their own validation needs, configurations and requirements for each OVS release. As OVS and DPDK scale to add more features and functionality, is there a need for a common community test plan with the aim of ensuring all OVS & DPDK functionality is being validated, not necessarily by just one contributor (which risks only a single configuration being validated) but by the community as a whole? More importantly is there an easy way to provide visibility on the validation process/progress for a given release to help aid the OVS community when approaching release date?
OvS Offload: Too Many Models? (Hemal Shah, Broadcom)
Open vSwitch (OvS) is a widely deployed open source virtual switch today. In the last three years, several OvS data plane processing offload models including match offload, partial actions offload, full offload with SR-IOV, and full vhost offload have been proposed. These offload models take advantage of underlying frameworks like Linux TC Flower and DPDK rte_flow. Architecturally, it is not possible to fit a single OvS offload model into all use cases and workloads. Furthermore, the maturity of OvS offload models makes it hard to deploy OvS offload solutions. In this talk, we will provide an overview of OvS offload models and pros/cons analysis for each one of them. The goal is to help the community understand the differences and challenges with the overchoice of OvS offloads.
OvS Offload Layer Design Challenges (Hemal Shah and Sriharsha Basavapatna, Broadcom)
Open vSwitch (OvS) offload layer is a control plane that resides between the OvS and the network device layer. The offload layer interfaces with the underlying infrastructure like Linux TC flower or DPDK rte_flow. The current design of the offload layer is not optimized for dynamic rebalancing of offloaded flows, control plane performance, and multi-flow offloads. In this talk, we will review the challenges with the offload layer including single thread model, offload sequencing, and offload state maintenance. We will propose design improvements to the offload layer.
Next steps for even higher performance of the SW Datapath in OVS (Harry van Haaren, Intel)
Continuing in the progression of SW datapath optimizations, this talk details how the DPIF and Miniflow Extract components of the Userspace datapath in OVS can be further optimized. The talk introduces the work achieved by DPIF and Miniflow Extract, and how the implementation today is based around memory accesses. By re-designing and re-thinking, we convert todays memory-bound implementation to a compute-bound problem, enabling optimized SIMD solutions using e.g. AVX512 for higher performance.
Port Mirroring Offload (Timothy Miskell, Larry Wang, and Emma Finn, Intel and Munish Mehan, AT&T)
Traffic monitoring in a Software Defined Network (SDN) based networking infrastructure is critical, particularly as an increasing number of traditional network functions are deployed over virtualized infrastructures. Similar to traditional deployments, traffic monitoring ensures the security, performance, and transparency of the underlying network. In the context of virtualized infrastructures, deployment of a virtualized Test Access Point (TAP) service has been reported as an effective Virtual Network Function (VNF) that can provide similar monitoring capabilities to a physical TAP. Unfortunately, in an environment such as Open vSwitch, where inter-VM communication can be expensive, it has been observed that virtual TAPs can contribute up to 70% performance degradation to a given source VNF. As part of this talk and live demonstration, we present a hybrid approach that allows network administrators to mirror VIRTIO port traffic to another Virtual Function (SR-IOV) via NIC hardware offloading. Our results show that the mirrored traffic can be viewed through a monitoring VNF, residing within a separate VM, while simultaneously reducing the throughput overhead on the source VNF by as much as 50%.
OVS Benchmark (Guy Twig and Roni Bar Yanai, nVidia)
Virtual switches are a fundamental building block in the industry's movement towards cloud-based (and SDN) environments.
In virtual switches, switch structures have changed and switch functionality has been greatly enhanced. The new switch structure increases the switch capabilities such so that, for example, switches can now also function as a distributed switch. Furthermore, new functionalities of the switch have provided the switches with tasks such as managing security groups on the switching layer.
We recommend extending existing benchmarks, such as RFC 2544, RFC 8204 and VSPERF suggested by OPNFV, by using the new OVS benchmark which covers new use cases that address and emphasize the new structure and functionality of the virtual switches.
The OVS benchmark takes into consideration the common topologies currently in use and addresses the extended switch functionality (such as underlay network and security groups). It also addresses "realistic" profile scenarios. A common profile now is one that defines a mix of sessions of different size, duration, and distribution and generates traffic patterns that are closer to the complex traffic patterns seen on customers sites.
Adopting this new standard will be beneficial for tracking OVS performance, hardware offload performance, and comparison of other virtual switches.
Debugging OVSDB with stream record/replay. (Ilya Maximets, Red Hat)
OVSDB was originally designed to store configuration for Open vSwitch daemon. It was intended to handle fairly small amount of data that covers network configuration for a single host. But, at the same time, it was designed to be flexible and easy to talk with via OVSDB management protocol (RFC7047). Today OVN becomes a primary user of OVSDB and this rises new challenges in the area of performance and scale. This talk is about debugging issues and performance testing of OVSDB (and other components) in your OVN deployments. Just record the problematic use-case or a production workload and debug it while sitting at home and using only power of your own laptop.
We hope that these patches will be accepted before the talk: https://patchwork.ozlabs.org/project/openvswitch/list/?series=186549&state=*
Community updates and new LTS process (Ilya Maximets, Red Hat)
This talk is about what happened during the last year. Will mostly cover patch review/acceptance statistics, important community activities such as attempts to clean up patch review backlog, and changes in release process, i.e. new LTS, policies for LTS releases and maintenance.
We hope that these patches will be accepted before the talk: https://patchwork.ozlabs.org/project/openvswitch/list/?series=204478&state=*
DPDK Meson build for OVS (Sunil Pai G, Intel)
The Makefile support is officially removed as of DPDK 20.11 and we are to use a faster build tool named "Meson".
In this presentation we discuss how to build OVS when DPDK is built via Meson instead of Make, the "gotcha"s and highlight a few important differences between them.
Growing Pains - DPDKs move to API/ABI Stability and its effect on OVS (Ian Stokes, Intel)
With DPDK 19.11, the DPDK community changed its approach towards ABI/API compatibility with the aim of minimizing breakages for those consuming DPDK releases post 19.11. As such, DPDK 20.11 is the first release since then to contain ABI/API changes. This talk will review the effort required for OVS to move between DPDK LTS releases pre DPDK 19.11 with the move to DPDK 20.11 as well as discussing issues raised in the community such as the need for accepting experimental APIs from the DPDK library in future for OVS releases.
Enabling asynchronous Para-virtual I/O in OVS (Sunil Pai G, Intel)
In this presentation, we discuss the benefits and challenges of utilizing the new asynchronous vhost APIs in DPDK. With the asynchronous framework, vHost-user can offload the memory copy operations to the hardware without blocking the CPU, thus freeing up precious CPU cycles for other important tasks, potentially allowing for better performance.
OVS DPDK VXLAN & VLAN TSO, GRO and GSO Implementation and Status Update (Yi Yang, Inspur)
All the mainstream NICs can support VLAN TSO and VXLAN TSO, VLAN TSO and VXLAN TSO can help improve both south-north and east-west TCP performance dramatically, but unfortunately OVS DPDK can’t support VXLAN TSO so far, I have implemented this and sent out v3 patch series. It isn’t enough to only have TSO support because TSO can’t handle UDP fragment offload (UFO), so we have to bring in GSO implementation, of course, GRO support is also must-have, TCP segments reassembling can improve TCP performance significantly in TSO case. DPDK only has VLAN/VXLAN TCP GRO and GSO, it doesn’t implement VLAN/UDP GRO and VXLAN UDP GRO as well as VXLAN UDP GSO, I have implemented them and fixed parent mbuf free issue in GSO case, they should be merged by this conference. I have deployed it into our internal Openstack test environment, TCP and UDP performance are much better than current OVS DPDK. In this talk, I will present implementation details, current progress and some issues we’re facing, I hope OVS community can take it seriously and review and merge it as soon as possible, this is really good thing to users.
Hassle-free migration of OpenStack cloud to OVN (Frode Nordahl, Canonical)
Can I migrate my legacy OpenStack Neutron ML2+OVS deployments to OVN?
You can, and if you come watch this presentation and demo you will gain insight about how it can be done!
Incremental processing improvements in ovn-controller in 20.06 and 20.09 (Numan Siddique, Red Hat)
This talk will highlight the improvements done in ovn-controller in handling the Southbound database and OVS database updates with some performance data and its impact on the scale.
OVN with EVPN (Ankur Kumar Sharma and Greg A Smith, Nutanix)
Ethernet VPN (EVPN) is evolving as the key technology to achieve network virtualization in data centers. It is used by hardware VTEPs to exchange information that facilitates packet forwarding. EVPN uses MP BGP to distribute information about endpoints behind VTEPs.
In this talk, we will explain what all use cases we can achieve by integrating OVN with EVPN. We will primarily focus on different scenarios around achieving connectivity between OVN endpoints and endpoints behind hardware VTEPs. Additionally, we will also talk about how we use the BGP distribution from FRR to achieve the integration.
Integrating Open vSwitch with Netplan.io (Lukas Märdian, Canonical)
Netplan allows to define complex network scenarios in a descriptive manner via YAML files. Those can be rendered into corresponding configurations for the underlying backend technology, such as systemd-networkd on Linux. The recent addition of Open vSwitch as a Netplan backend enables users to describe OVS components in combination with their existing Linux networking within the same YAML file. In this talk I will present how OVS was integrated as a Netplan backend and how it can be used in production.
Converting OpenFlow to P4 (James Choi, Intel)
With ever-increasing need to add the flexibility to their network packet pipeline, the adaptation of P4 technology has been gaining popularity among the enterprise and cloud providers. P4 technology first gained popularity in defining HW pipelines, and various work has been done to enable SW dataplanes through P4 as well. One of the challenges in adopting P4 in the control plane has been incorporating P4 into widely deployed OpenFlow based orchestration. This difficulty stems from the differences between OpenFlow and P4 specifications, such as the differences in the flexibility of the tables entry types and the restrictions in the action executions.
In this talk, I will discuss the differences and the limits involved in mapping OpenFlow to P4, along with some techniques in mapping OpenFlow constructs that don't naturally map to P4. I will also discuss a mechanism that allows an SDN controller such as OVN to continue to use OpenFlow while providing annotations to map this into P4 tables in OVS.
p4proto: Cooking OVS with P4 Spice! (Namrata Limaye and Deb Chatterjee, Intel)
This presentation is about integrating two very powerful but dissimilar networking technologies. We want to simultaneously harness the power of the ubiquitous presence of the OVS-like software switches, and the flexibility of protocol and platform-independence that the P4 language provides. We must integrate the two in a meaningful way to combine their strengths and overcome their limitations. Our approach for this integration involves introducing p4proto as a parallel to ofproto within OVS, that maps Openflow tables to P4 tables, and establishes a compiler-based framework to efficiently map the P4 tables to any device's physical tables. We will also talk about converting OVS configurations into P4 table configurations for LAG, Mirroring, CT and S-Flow, to easily program these features on P4 based data planes directly, as compared to the current flow driven implementations of OVS.
vswitch.p4: Why OVS Needs P4 (Dan Daly, Intel)
In this talk we describe the advantages of replacing the OpenFlow in OVS with P4. We add a P4Runtime interface to OVS to enable writing the forwarding program in P4. This encodes more information on pipeline requirements, allowing for a compiler to highly optimize the datapath based on these constraints. We use the pipeline definitions of OVN and Antrea as examples of how the OpenFlow table configuration can be reproduced into a P4 program. We show that with this change in abstraction we can better optimize these forwarding programs and integrate them with connection tracking, LAG, tunnels, routing, NAT and IPSec in a vswitch.p4 that describes a complete datapath instantiated in Open vSwitch.