Open vSwitch and OVN 2019 Fall Conference
The Open vSwitch and OVN 2019 Fall Conference will be held on December 10 and 11 in Westford, MA. The schedule is below.
Conference speakers are welcome to use our slide template (PPTX). Using the template is optional.
Tuesday, December 10:
|8:00 AM - 8:45 AM||Breakfast and registration|
|8:45 AM - 9:10 AM||Welcome|
|9:10 AM - 9:17 AM||Programming memory safe OVS in Rust (William Tu and Yi-Hung Wei, VMware)|
|9:17 AM - 9:24 AM||Testing OVS at the DPDK Community Lab (Jeremy Plsek, UNH InterOperability Lab)|
|9:24 AM - 9:49 AM||The Discrepancy of the Megaflow Cache in OVS, Part II (Levente Csikor, National University of Singapore)|
|9:49 AM - 10:14 AM||A Kubernetes Networking Implementation Using Open vSwitch (Jianjun Shen, VMware)|
|10:14 AM - 10:39 AM||OVS DPDK issues in Openstack and Kubernetes and Solutions (Yi Yang, Inspur)|
|10:39 AM - 10:54 AM||Break|
|10:54 AM - 11:01 AM||OVS-DPDK life of a packet (Eelco Chaudron and Kevin Traynor, Red Hat)|
|11:01 AM - 11:08 AM||Utilizing DPDK Virtual Devices in OVS (Lei A Yao and Wang Yinan, Intel)|
|11:08 AM - 11:33 AM||[The Long Road to] Deployable OvS Hardware Offloading for 5G Telco Clouds (Majd Dibbiny, Mellanox; Anita Tragler, Red Hat; and Mark Iskra, Nuage Networks from Nokia)|
|11:33 AM - 11:58 AM||OVN for NFV workloads with Kubernetes (Ritu Sood and Srinivasa Addepalli, Intel)|
|11:58 AM - 12:58 PM||Lunch|
|12:58 PM - 1:05 PM||OvS-DPDK Acceleration with rte_flow: Challenges and Gaps (Hemal Shah and Sriharsha Basavapatna, Broadcom)|
|1:05 PM - 1:12 PM||Partial offload optimization and performance on Intel Fortville NICs using rte_flow (Irene Liew and Chenmin Sun, Intel)|
|1:12 PM - 1:37 PM||Kernel Based Offloads of OVS (Simon Horman, Netronome)|
|1:37 PM - 2:02 PM||An Encounter with OpenvSwitch Hardware Offload on 100GbE SMART NIC!! (Haresh Khandelwal and Pradipta Sahoo, Red Hat)|
|2:02 PM - 2:22 PM||Break|
|2:22 PM - 2:29 PM||IP Multicast in OVN - IGMP Snooping and Relay (Dumitru Ceara, Red Hat)|
|2:29 PM - 2:36 PM||We light the OVN so that everyone may flow with it (Frode Nordahl, Canonical)|
|2:36 PM - 3:01 PM||OVN issues seen in the field and how they were addressed (Numan Siddique, Red Hat)|
|3:01 PM - 3:26 PM||Multi-tenant Inter-DC tunneling with OVN (Han Zhou, eBay)|
|3:26 PM - 3:46 PM||Break|
|3:46 PM - 3:53 PM||How to dump your miniflow bits. (Ian Stokes, Intel)|
|3:53 PM - 4:00 PM||DDLog in OVN - current state and future challenges (Mark Michelson and Dumitru Ceara, Red Hat)|
|4:00 PM - 4:25 PM||P4-uBPF: Extending Open vSwitch packet processing pipeline at runtime using P4 (Tomasz Osiński, Orange Labs)|
|4:25 PM - 4:50 PM||OvS in the cloud: Testing our OpenFlow ruleset to keep development agile (Nicolas Bouliane and Blue Thunder Somogyi, DigitalOcean)|
|4:50 PM - 5:00 PM||Closing|
|6:00 PM - 8:00 PM||Conference dinner on-site|
Wednesday, December 11:
|8:00 AM - 9:00 AM||Breakfast and registration|
|9:00 AM - 9:07 AM||Welcome|
|9:07 AM - 9:37 AM||Keynote (Chris Wright, Red Hat CTO)|
|9:37 AM - 9:44 AM||OVS/OVN Split, An Update (Mark Michelson, Red Hat)|
|9:44 AM - 9:51 AM||Balance-TCP Bond Mode Performance Improvement (Vishal Deep Ajmera, Nitin Katiyar, Venkatesan Pradeep, and Anju Thomas, Ericsson)|
|9:51 AM - 10:16 AM||SmartNIC Hardware offloads past, present and future (Oz Shlomo and Rony Efrayim, Mellanox)|
|10:16 AM - 10:56 AM||Panel: OvS Acceleration: Are we there yet? (Hemal Shah, Broadcom)|
|10:56 AM - 11:11 AM||Break|
|11:11 AM - 11:18 AM||Upcall rate limiting (Vishal Deep Ajmera, Nitin Katiyar, Venkatesan Pradeep, and Anju Thomas, Ericsson)|
|11:18 AM - 11:25 AM||OVS-DPDK Performance Benchmark and Analysis with Multi-VMs (Lei A Yao and Chenmin Sun, Intel)|
|11:25 AM - 11:50 AM||Next steps for higher performance of the software data plane in OVS (Harry Van Haaren, Intel)|
|11:50 AM - 12:15 PM||Openvswitch packet processing optimization: a story on Arm architecture (Yanqin Wei, ARM)|
|12:15 PM - 1:05 PM||Lunch|
|1:05 PM - 1:12 PM||Running OVS on Containers (Shivaram Mysore, Service Fractal)|
|1:12 PM - 1:19 PM||Containerize OVS/OVN components (Aliasgar Ginwala, eBay)|
|1:19 PM - 1:26 PM||Deploying multi chassis OVN using docker in docker (Numan Siddique, Red Hat)|
|1:26 PM - 1:51 PM||The Gateway to the Cloud: OvS in a Layer-3 Routed Datacenter (Carl Baldwin and Jacob Cooper, DigitalOcean)|
|1:51 PM - 2:16 PM||OVN performance (Mark Michelson, Red Hat)|
|2:16 PM - 2:41 PM||Ice cream social|
|2:41 PM - 2:48 PM||Dynamic disabling/enabling of EMC based on traffic pattern (Vishal Deep Ajmera, Nitin Katiyar, Venkatesan Pradeep, and Anju Thomas, Ericsson)|
|2:48 PM - 3:13 PM||Using OVS to Implement High Performance NAT Gateway in Public Cloud Data Center (Yi Yang, Inspur)|
|3:13 PM - 3:38 PM||OVS with AF_XDP, what to expect (Eelco Chaudron, Red Hat and William Tu, VMware)|
|3:38 PM - 3:53 PM||Break|
|3:53 PM - 4:00 PM||Guest vlan tagging with OVN (Karthik Chandrashekar, Nutanix)|
|4:00 PM - 4:25 PM||OVN operationalization at scale at eBay (Aliasgar Ginwala, eBay)|
|4:25 PM - 4:50 PM||Magma: Building converged access networks using openvswitch at the edge to improve global connectivity. (Amar Padmanabhan, Facebook)|
|4:50 PM - 5:00 PM||Closing|
Programming memory safe OVS in Rust (William Tu and Yi-Hung Wei, VMware)
Programming in unsafe languages like C is easily in jeopardy of memory issues such as buffer and stack overflows, dangling pointers, accesses of uninitialized or deallocated memory, and memory leakage. These memory safety issues can be exploited and leads to serve security vulnerability, and unpredictable behavior. For example, memory bugs in OVS’s packet processing, especially in parsing packet content and building the flow key, are prone to attacks, since it is always be executed for every packet. Thus, an attacker can craft packets to exploit the memory issue and launch a buffer overflow attack, such as . Other than the security vulnerability, a small memory leakage in the packet processing path can quickly accumulated and leads to ovs-vswitchd crash when it runs out of memory, such as . Bugs like these two are hard to detect and common in software written in C language.
Rust is a system programming language that provides memory safety without runtime overhead. Rust gives users fine control over the use of memory, but keeps track of the lifetime and ownership of each memory region. This model leads to less likely of memory leak, dangling pointers, or memory corruption issues. In this talk, we are working on replacing one of the memory safety critical parts in OVS from C code to Rust, starting by flow_extract(). We will share our experiences and lessons learned in this process.
 commit 1bddcb5dc598 (“ofproto-dpif-xlate: Fix bug that may leak ofproto_flow_mod”)
Testing OVS at the DPDK Community Lab (Jeremy Plsek, UNH InterOperability Lab)
I will talk about what is the DPDK Community Lab and the type of testing we do with OVS. This will include briefly talking about how some basic testing is performed and how we run performance testing using ovs_perf with hardware from different vendors.
The Discrepancy of the Megaflow Cache in OVS, Part II (Levente Csikor, National University of Singapore)
Open vSwitch (OVS) has stood the test of time in the field of OpenFlow (OF) software switching and its brand has been spreading virally in an unprecedented way; it is present in almost all (open source) networking environment starting from simple (Linux-based) operating systems through heavily virtualized Cloud Management Systems (e.g., OpenStack) to serverless environments (e.g., Kubernetes).
Due to Network Function Virtualization (NFV) and the ever increasing trend of offloading (business-critical) workloads to the public cloud, significant parts of the networking ecosystem (e.g., packet classification) have inherently become offloaded to virtualized packet processors (i.e., Open vSwitch). High traffic demands and latency-critical applications, on the other hand, require the packet classifier to be highly efficient and dependable. To support this, several years (and version numbers) ago, OVS has introduced a layered cache architecture in its fast past to achieve a reliable and blazing packet processing .
In our previous talk [2, 3], we have demonstrated that the packet classification algorithm in this caching architecture, namely the Tuple Space Search (TSS) scheme in the second level MegaFlow Cache (MFC), has an algorithmic deficiency that can be abused by an attacker to push this generally high performing packet classifier to its corner case. Particularly, we have shown that for each simple ACL (e.g., allow destination port 80 and drop everything else[*]) there is a specially crafted packet sequence that when subjected to this ACL in lower than 1 Mbps traffic rate, can virtually bring down OVS causing a complete denial-of-service for each workload accessible through that OVS instance, i.e., in a cloud environment, all services that happened to be scheduled to the same hypervisor become inaccessible. Note that such a service outage can cause millions of dollars for enterprises .
[*] According to the cloud security best practice’s Whitelist default deny policy
Our previous study, however, has some limitations: (i) an attacker exploiting this discrepancy has to be aware of the target ACL installed in the flow table and (ii) beside some immediate yet impractical remedies (e.g., switching MFC off, relying on a different hypervisor switch implementation instead), it is lacking of any countermeasure. Moreover, to easily support (i), the threat model was limited to cloud environments, where an attacker has to lease her own resources, define her own (malicious) ACLs and send a specially crafted packet sequence towards her own service resulting in a denial- of-service attack against all co-located workloads only.
In this work, we have carried on with the main idea of  and we study whether an attacker can get rid of the above mentioned limitations (i.e., attacking arbitrary victims), and how its impact (e.g., level of degradation, required packet rate) would change (if it is possible at all, see  for more details). Furthermore, we have developed a promising mitigation technique and we are reaching out to the OVS community to share and discuss the obstacles still present on the way to completely resolve this issue.
During our talk, first, we briefly cover the main properties of the caching architecture and the main outcomes of our previous study (670 kbps of attack traffic from a single traffic source can easily degrade a single OVS instance from its full capacity of 10 Gbps to 2 Mbps) that we term here as co-located case.
Then, we show that when an adversary has no such access to her target (e.g., neither leased resources in the cloud nor knowledge of the installed ACLs), she can still achieve a significant degradation of 88% from the maximum capacity with low attack traffic volume (6.72M bps). One interesting aspect of this latter approach (hereafter, termed as general case) is that it does not demonstrate any specific patterns of its attack traffic: it only requires completely random packet header fields and arbitrary message contents, along with arbitrary packet arrival times making the overall identification hard as it is not straightforward to define a specific signature of the attack traffic.
As a mitigation technique, we present a cache management scheme, which we call MFC Guard (MFCg), that dynamically monitors the number of entries in the MFC and removes less important ones to reduce the performance overhead of the TSS algorithm. It is worth noting that we have observed negligible impact on the overall packet processing performance during the monitoring process itself (i.e., executing ovs-dpctl dump-flows in, say, each second).
We show that MFCg can limit and even completely avoid the performance degradation for the packets that are eventually allowed to the system. However, nothing comes without a sacrifice: since removing a cache entry from the MFC results in the corresponding malicious flow’s packets to be processed (again) by the slow path (i.e., ovs-vswitchd), this guaranteed performance of the allowed packets imposes some extra overhead. Contrary to the fact those packets should be cached again in the MFC, we have observed an unexpected behavior: such packets will always be processed by the slow path henceforth. Even if the overhead becomes constant with this conduct, as long as the attack rate is less than 1, 000 pps (< 1 Mbps) the slow path only consumes 15% of the CPU; recall, this packet rate is enough to bring down OVS in case of co-location ). However, when the packet rate is 10, 000 pps (< 7 Mbps), the CPU load jumps up to ≈ 80% (this rate would be enough to degrade the full capacity to 10% in the general case). We can conclude that our current MFCg implementation can efficiently mitigate both attacks as long as the attacking rate is low, however further optimization can be carried out regarding the behavior of OVS. On the other hand, if the attack rate is much above 10, 000 pps, such an attack can be considered as a volumetric attack, for which there are multiple solutions to efficiently detect and mitigate (e.g., excess amount of packets and over-provisioning, scrubbing techniques).
 J. Pettit, “Accelerating Open vSwitch to “Ludicrous Speed”,” Blog post: Network Heresy - Tales of the network reformation, https://networkheresy.com/2014/11/13/accelerating-open-vswitch-to-ludicrous-speed/, 2014.
 Levente Csikor and Gábor Rétvári, “The Discrepancy of the Megaflow Cache in OVS,” Full talk at OVS Fall 2018 Conference, Dec. 2018.
 L. Csikor, C. Rothenberg, D. P. Pezaros, S. Schmid, L. Toka, and G. Rétvári, “Policy injection: A cloud dataplane dos attack,” in Proceedings of the ACM SIGCOMM 2018 Conference on Posters and Demos, ser. SIGCOMM ’18, 2018, pp. 147–149. [Online]. Available: http://doi.acm.org/10.1145/3234200.3234250
 Dan Kobialka, “Kaspersky Lab Study: Average Cost of Enterprise DDoS Attack Totals $2M,” Blog post, https://www.msspalert.com/cybersecurity-research/ kaspersky-lab-study-average-cost-of-enterprise-ddos-attack-totals-2m/, 2018.
 Levente Csikor and Dinil Mon Divakaran and Min Suk Kang and Attila Korosi and Balázs Sonkoly and David Haja and Dimitrios P. Pezaros and Stefan Schmid and Gábor Rétvári, “Tuple Space Explosion: A Denial-of-Service Attack Against a Software Packet Classifier,” in to appear at ACM CoNEXT 2019 Conference, Dec 2019.
A Kubernetes Networking Implementation Using Open vSwitch (Jianjun Shen, VMware)
This talk introduces a Kubernetes networking implementation built on top of Open vSwitch. The solution is designed to be Kubernetes centric – it supports Kubernetes networking functionalities (Pod network, NetworkPolicy, ClusterIP and NodePort Service) and supports only Kubernetes, it is optimized for Kuberenetes networking, it tries to be Kubernetes native and leverages Kubernetes as much as possible. The presented solution uses OVS to implement all Pod networking functionalities in a Kubernetes Node, and targets for a high performance Kubernetes NetworkPolicy implementation with optimizations in control plane and OVS flows. With the great portability of OVS, the solution could support any compute platforms Kubernetes runs – virtual machines on any hypervisor and public cloud, baremetal container hosts, Windows hosts. The project will be open source but it is still at an early stage now and not available for use yet.
This talk also presents a few challenges we saw in implementing Kubernetes networking, especially Kubernetes NetworkPolicy with OVS. We like to get feedback from the audiences on the implementation, and discuss how OVS can support Kubernetes network better and be the best data plane choice for Kubernetes.
OVS DPDK issues in Openstack and Kubernetes and Solutions (Yi Yang, Inspur)
Although OVS DPDK can be used to accelerate tenant networking, it also has many limitations because it has to communicate from user space with tap and veth device in kernel, this has huge side impacts on networking performance, we found many issues in practice in Openstack deployment and would like to share them to the community, this talk will show all the known issues we found and propose some solutions to them in order that the community can take them seriously and spend some time in improving them. This talk also will demonstrate the issues.
OVS-DPDK life of a packet (Eelco Chaudron and Kevin Traynor, Red Hat)
The PVP path is packets being received from physical interface (P), transmitted and received to/from vhost interfaces (V), and finally sent to a physical interface (P).
We will talk about what are the areas along that path where packet drops may occur, what to look out for in statistics and some recent patches to help with visibility.
Utilizing DPDK Virtual Devices in OVS (Lei A Yao and Wang Yinan, Intel)
OVS start to support virtual DPDK poll mode driver (vdev PMD) from release 2.7. These network virtual devices can be used in different scenario. For example, KNI, Virtio-user, TAP devices can re-direct the packets from user-space back to kernel, recognized as exception path capability which is useful for container network and control path. af_packet, pcap, af_xdp are virtual devices which can expose kernel packets to user space.
This talk will give one brief introduction for these DPDK network virtual devices, including the performance comparation and several configuration limitations during implementing these devices into OVS-DPDK.
[The Long Road to] Deployable OvS Hardware Offloading for 5G Telco Clouds (Majd Dibbiny, Mellanox; Anita Tragler, Red Hat; and Mark Iskra, Nuage Networks from Nokia)
For several years “Smart” or “Intelligent” NICs have been commercially available with embedded switching capability implemented as an integral feature of the NIC’s silicon. Whether this embedded switch is an ASIC or an FPGA, the potential exists to achieve dramatically higher performance than with traditional kernel or DPDK OvS flow processing. Many of the API’s required to leverage this new generation of NICs (e.g. tc flower) has also been upstream for some time. So what could possible go wrong in deploying this solution?
The complexity involved in offloading flows generated by an SDN controller at scale needed to meet the needs of typical Telco customers requires: tight integration with OpenStack to support VLAN aware vswitches, underlay VLANs with VXLAN overlay tunneling, VXLAN L2 bridging and L3 routing flow offload, QoS support for ToS/TTL, remote port mirroring for debugging and legal intercept, load balancing with bonding offload, flow statistics, debugging between at least three flow databases (OvS vswitchd, TC kernel and NIC eswitch), cross-NUMA performance impact, measuring flow insertion rate and NIC bandwidth sharing for edge use cases.
In this session, Red Hat, Mellanox, and Nuage Networks (Nokia) share some of the challenges and progress encountered on the road to hardware offloaded OvS deployments for the 5G telco cloud.
OVN for NFV workloads with Kubernetes (Ritu Sood and Srinivasa Addepalli, Intel)
There is tremendous interest in deploying NFV workloads from Kubernetes in Edges as Edges can't afford to have two resource orchestrators - One for applications and another for VNFs/CNFs. NFV workloads can be management plane workloads, control plane workloads and user plane/data plane workloads. Data plane workloads normally have multiple interfaces, multiple subnets, and multiple virtual networks. Some data plane workloads require SR-IOV NIC support for data interfaces and virtual NIC for other interfaces. NFV workloads may require dynamic creation of virtual networks and dynamic configuration of subnets. Some NFV's also require provider network support. There is also a requirement for route management across virtual networks and external networks.
Ovn-kubernetes project (https://github.com/ovn-org/ovn-kubernetes) is Kubernetes integration for OVN. To meet NFV requirements as listed above some functionality beyond the functionality provided by ovn-kubernetes is required. To fill the functionality gaps a project under OPNFV is started called ovn4nfvk8s Plugin (https://github.com/opnfv/ovn4nfv-k8s-plugin). This project is filling in the functionality gap by providing dynamic creation and deletion for virtual networks and provider networks, adding multiple interfaces to workloads either virtual network interface or provider network interface and route management for these networks. This project works with Multus to provide a CNI to attach multiple OVN interfaces into workloads and supports CRD controllers for dynamic networks and provider networks and for route management.
OvS-DPDK Acceleration with rte_flow: Challenges and Gaps (Hemal Shah and Sriharsha Basavapatna, Broadcom)
Open vSwitch (OvS) is a widely deployed open source virtual switch today. OvS-DPDK uses Data Plane Development Kit (DPDK) to run entire OvS in user space. OvS-DPDK significantly improves packet processing and forwarding rate by eliminating the cost of interrupt processing and user-kernel context switches. DPDK recently introduced rte_flow API to provide a generic means to further improve flow classification and action processing by accelerating them in hardware for specific ingress or egress flows.
The current rte_flow API has several missing pieces including hardware flow processing capabilities discovery, no mapping for some of the OvS actions (e.g. Output-Port) onto rte_flow actions, gaps in mapping OvS-DPDK control plane operations onto rte_flow operations, and OvS actions that are not currently supported by OvS-DPDK when it uses rte_flow. This talk will cover the challenges and gaps in OvS-DPDK acceleration with rte_flow and propose solutions to address them.
Partial offload optimization and performance on Intel Fortville NICs using rte_flow (Irene Liew and Chenmin Sun, Intel)
OVS in the kernel datapath had been enabled with hardware offload through TC flower packet classifier in the kernel. In the userspace datapath (OVS-DPDK), various development and optimization to improve/accelerate the packet classification in the virtual switch have been the focus to achieve higher switching capability in respond to the rapidly changing cloud and telco networking. Partial offload or partial hardware offload pushes flow rules along with unique marks to the network card. OVS-DPDK uses rte_flow API to match packets based on flow rule and mark them accordingly. The rules are programed into Intel(r) Fortville NIC FDIR (flow director). Then OVS-DPDK virtual switch would use each unique mark to find the specific flow rule and execute the actions in software. With partial offload, OVS-DPDK does not need to parse all the packets headers and goes through EMC. OVS-DPDK leverages the network card flow MARK action's support in the hardware to skip some very costly CPU operations in the host. However, the current insertion and deletion flow operations using rte_flow API are very slow in the i40e DPDK driver, which limits userspace datapath OVS benefiting from rte_flow hardware offload feature. This presentation introduces the rte_flow driver optimization for i40e driver in DPDK. The optimization includes implementation of rte_bitmap and software pipeline to manage hardware resources and to avoid synchronization wait time for the hardware. In addition, the consumed cycles are further compressed via optimizing the dynamic memory allocation code. The performance of the revised code is 20,000 times better than the original code. With the latest optimization of i40e driver in DPDK, partial hardware offload can be enabled in OVS-DPDK (userspace datapath) using Intel(r) Fortville NICs (XL710, X710 and XVV710). With partial offload, virtual switching throughput increases significantly with observation of 1.7x gain in Phy-Phy running with 2000 UDP rules program in OVS up to 1 million flows. Less CPU cycles is being utilized to classify the packets.
Kernel Based Offloads of OVS (Simon Horman, Netronome)
Open vSwitch is often deployed to provide network access to VMs in cloud environments. The rich feature-set of OVS makes it attractive for these deployments as it allows policy enforcement, encapsulation, quality-of-service and a myriad of other features. However, the features facilitated by a software-based switch come at a cost in terms of performance and host CPU utilisation. Hardware offload seeks to address those short-comings while preserving the richness of OVS.
This presentation will take a look at the evolution of kernel-based offloads for OVS examining the offload model adopted by the upstream kernel, features currently supported, and possible future developments in this area.
An Encounter with OpenvSwitch Hardware Offload on 100GbE SMART NIC!! (Haresh Khandelwal and Pradipta Sahoo, Red Hat)
True that 100GbE NIC is still far reality when it comes to production-grade deployment. However, Vendors were quick to make 100BbE ethernet switches/NICs which are readily available today and In fact, they have gone 1 notch up and more & more features are being integrated/elevated/developed to suit 100 gig infrastructure now.
Target to NextGen data center, OpenvSwitch is further accelerated through Hardware offload where it leverages the functions to enable split data plane between OpenvSwitch and SMART NIC(100 GbE) with help of Embedded Switch functionalities. However, From hardware stability to software handling such humongous stress is challenging and the maturity level of 10/40 GbE nic is still hard to achieve especially when it comes to fine-tuning compute for performance. We explored this territory with OpenvSwitch(TC-HW-Offload) enabled on SMART NIC (Mellanox ConnectX-5 Ex 100GbE) nic to measure its throughput and latency numbers in offering when it plugged to Openstack infrastructure/services.
We would like to showcase the following at this conference.
- The problems encountered while working on OVS-HW-Offload with SMART-NIC and resolution applied to those
- Fine-tuning of the required parameters to achieve near line-rate throughput and latency
- The tools (TestPMD and Trex) and performance methodology used in our attempt
- The numbers observed & analysis under the best possible use case
- And sharing our unresolved/mystified problems/puzzles to the wider audience
IP Multicast in OVN - IGMP Snooping and Relay (Dumitru Ceara, Red Hat)
Until version 2.12 the multicast support in OVN was limited to flooding traffic within the broadcast domain without allowing it to cross network boundaries. In this presentation we describe the newly added IP Multicast support in OVN. We talk about IGMP snooping in OVN networks but also about IP multicast routing performed without the need of a classic multicast routing protocol. We zoom in on the implementation as this is one of the few OVN features where ovn-northd dynamically installs logical flows based on IP multicast information snooped by ovn-controllers.
We will present the following:
- OVN L2 multicast packet handling.
- OVN L3 multicast implementation for switches (IGMP Snooping) and for routing across logical networks (IGMP Relay) along with packet flow.
- An IGMP based solution for IP multicast connectivity in OVN-kubernetes in switch-per-node topologies.
We light the OVN so that everyone may flow with it (Frode Nordahl, Canonical)
How can deployment of OVN be made easier?
With Juju Charms for OVN we seek to provide an opinionated, yet flexible way to deploy and operate OVN.
Among other things we support automation of PKI infrastructure with Vault which allows end user to integrate with existing company PKI policies. And we do OVN RBAC by default.
The same code can be re-used to deploy OVN with different CMSs such as Kubernetes and OpenStack and I'll touch on how you would go about to integrate your own.
OVN issues seen in the field and how they were addressed (Numan Siddique, Red Hat)
This talk will highlight the various field issues seen by Red Hat customers using OVN in the past year - like scale issues, high ovn-controller CPU usage, etc., and how they were addressed or mitigated.
Multi-tenant Inter-DC tunneling with OVN (Han Zhou, eBay)
When there are multiple OVN deployments in different data centers or availability zones, an interconnection mechanism is required to connect the overlay networks for different tenants. While it is possible to achieve the same with existing VPN technologies, native support from OVN provides significant operational advantage. This talk presents the design and implementation of the native OVN support for multi-tenant tunneling between different OVN deployments. It also discusses the related solutions for gateway load balancing and redundancy.
How to dump your miniflow bits. (Ian Stokes, Intel)
In OVS 2.12, the Datapath Classifier code was refactored to enable the specialization of specific subtables based on miniflow attributes which, in turn, enhances the performance of subtable searches. An initial set of these specialized subtables was implemented to correspond to common traffic patterns. However, as there are many different patterns possible, there is a chance that users will not benefit from a corresponding performance gain unless a specialized subtable exists in the codebase that corresponds to this pattern. Adding new specializations is easy but identifying the subtable miniflow bits that are needed to specialize the subtables for their use-case is more difficult. To help in this effort we are proposing a new OVS command e.g. ovs-appctl dpctl/dump miniflow bits. This talk will seek to give an overview of the command, usage, and demonstrate an example of traffic patterns to miniflow bit matches.
DDLog in OVN - current state and future challenges (Mark Michelson and Dumitru Ceara, Red Hat)
This session will talk about our (Red Hat's)
- experimentation with ddlog northd
- what were the pain points
- what were the benefits
- some suggestions on how it can be improved and
- what needs to be done to get this into production.
P4-uBPF: Extending Open vSwitch packet processing pipeline at runtime using P4 (Tomasz Osiński, Orange Labs)
Data plane programmability and the P4 language have become the next step in the evolution of Software-Defined Networking enabling programming of protocol-independent packet parsers and packet processing pipeline for network devices. Recently, the PISCES solution demonstrated the feasibility of a protocol-independent software switch using P4 as the programming language and Open vSwitch as the target switch. However, PISCES requires re-compilation every time the P4 program is changed and in some situations the hypervisor switch needs to be upgraded and/or customized at runtime in order to support new protocol headers, encapsulation techniques or even implement middlebox-like network functions. In this talk I would like to present the runtime programming of extensions for Open vSwitch using the P4 language. The solution is based on the Oko switch, the extension to Open vSwitch, which allows to inject user-space BPF programs acting as stateful packet filter. We have enhanced the Oko switch with support for programmable actions (packet modifications, tunneling), new APIs to control BPF maps, the P4-to-uBPF compiler with support for stateful P4 objects (registers) and the P4Runtime-based abstraction layer. In the presentation I will describe the design and implementation of the solution. Moreover, the presentation will include discussion about implementation problems that we faced and performance optimizations that we applied to the P4 compiler. To sum up, our contribution to Open vSwitch allows to dynamically reconfigure Open vSwitch's packet processing pipeline using the high-level domain-specific language (DSL) such as P4 and protocol-independent SDN control protocol such as P4Runtime.
Moreover, our enhancements enable running stateful, middlebox-like network functions inside Open vSwitch. This feature can be used by tenants to offload part of packet processing functionalities from virtual machines to the virtual networking layer. Therefore, based on our contribution we also would like to revive the idea of Topology Service Injection for OpenStack - the Neutron plugin to inject middlebox-like functions into the OpenStack network infrastructure. With the use of the P4 language and P4-capable Open vSwitch the Topology Service Injection plugin would not be limited by the features provided by the OpenFlow architecture. Thus, we also would like to propose a new, P4-based design of Topology Service Injection plugin for OpenStack, which would support more powerful data plane applications that can be injected into the networking layer.
OvS in the cloud: Testing our OpenFlow ruleset to keep development agile (Nicolas Bouliane and Blue Thunder Somogyi, DigitalOcean)
DigitalOcean has been a long-time user of Open vSwitch as a core element of it’s networking architecture. But until recently, the various networking teams did not have a unified method of testing datapath changes. We began to address these concerns this year through the creation of a testing framework.
This session will highlight our journey building our datapath validation framework. We will present what we express via our datapath, the way we test it, and the challenges that we face. As a result, listeners can expect to learn how to specifically leverage go-openvswitch for their own testing needs.
Many products at DigitalOcean use a set of tables and flows inserted via Open vSwitch to form our datapath. These include customer-facing products, such as load balancer, floating IP, firewall, VPC as well as many internal core primitives like the metadata service, DHCP, an ARP proxy, north-south gateways, etc.
The teams behind these products are constantly developing new features and deprecating old ones. Much of that logic ends up being expressed through our datapath, which increases its level of complexity.
The goal of the datapath validation framework is to make sure that these teams are not conflicting with each other while quickly iterating through their modifications of the datapath. It is also vital to confirm that introduced changes don't cause datapath regression.
OVS/OVN Split, An Update (Mark Michelson, Red Hat)
I'll update the community on what has been done regarding splitting OVN from OVS, what challenges we faced, and what we still have left to do.
Balance-TCP Bond Mode Performance Improvement (Vishal Deep Ajmera, Nitin Katiyar, Venkatesan Pradeep, and Anju Thomas, Ericsson)
Existing implementation of bond mode ‘balance-tcp’ uses hash() and recirc() datapath actions. After recirculation, the packet is forwarded to the bond member port based on 8-bits of dp_hash value calculated in the datapath.
This mechanism has two issues:
Additional recirculation of packet degrades the performance of ‘balance-tcp’ mode when compared with ‘balance-slb’.
After recirculation if the packet is to egress on tunnel port (for e.g. VxLAN or GRE), dp_hash is calculated over the tunnel header and so tenant traffic is not load balanced effectively.
We propose introducing a new load-balancing output action lb-output instead of hash() & recirc().
Maintain one table per-bond (an array of uint16’s) which maps RSS hash to a bond output port. This table is populated using same algorithm that the ofproto layer is using today for creating dp_hash flows. This table is looked up in lb-output action processing thus avoiding recirculation.
As the recirculation is avoided, the RSS hash available from the incoming packet (from tenant virtual machine) is used to determine the output port from a given bond table. This ensures better distribution of traffic across bond ports when using tunnels.
Statistics are maintained for each bond table entry to balance load dynamically across bond ports.
SmartNIC Hardware offloads past, present and future (Oz Shlomo and Rony Efrayim, Mellanox)
Data path processing of high throughput networks introduce huge CPU overheads, thus making HW offload a must requirement for hyperscale computing, telecom etc. The OVS HW offload architecture enables incremental introduction of HW offload features while allowing flows to be either fully offloaded, where all the processing is done in HW, partially offloaded, where the hardware performs some of the flow rules and actions, or not offloaded at all where the packet processing is entirely executed in software.
The talk will recap the OVS infrastructure for hardware offloads while focusing on the HW offloads model of tunneling and connection tracking for OVS DPDK using rte_flow.
Panel: OvS Acceleration: Are we there yet? (Hemal Shah, Broadcom)
Open vSwitch (OvS) is a widely deployed open source virtual switch today. In last three years, OvS acceleration has become increasingly important to enhance the flow processing and packet forwarding performance of OvS. There have been several flow acceleration frameworks including Linux TC Flower, DPDK rte_flow, and eBPF in existence now for OvS acceleration. That being said, have we reached the maturity for OvS acceleration? Do we need all these frameworks for OvS acceleration? What are the gaps in OvS acceleration? Are the perceived performance benefits of OvS acceleration achievable in real deployments? In this panel, we will try to discuss these questions and many more across a group of panelists to help the community understand the current state of OvS acceleration.
Upcall rate limiting (Vishal Deep Ajmera, Nitin Katiyar, Venkatesan Pradeep, and Anju Thomas, Ericsson)
In OVS-DPDK , both slow path and fast path execute in the context of the PMD thread. This means that new flows leading to upcalls can increase the latency of other packets which are serviced by the same PMD. Sudden burst of such flows can also lead intermittent traffic drop for already learnt flows . This also make OVS-DPDK vulnerable for DoS attacks.
We propose a solution that would implement a per port upcall rate limiting to ensure that service for other ports on the PMD is not affected or worse denied. We defined a upcall rate per port using a CLI like:
ovs-vsctl set port <port name> upcall-rate-limit
A simple token bucket policer per port per PMD restricts the flow of packets from fast-path into slow-path. This Upcall Policer allows only configured number of packets per second (pps) into upcall. A packet entering slow-path has to take a token to get into slow-path and if no tokens are available, the packet is dropped.
OVS-DPDK Performance Benchmark and Analysis with Multi-VMs (Lei A Yao and Chenmin Sun, Intel)
OVS-DPDK is designed for high performance and widely deployed in cloud service provider today. In the past, there are a lot of performance analysis reports for OVS-DPDK with single VM scenario or limited port numbers. But the typical characteristics of cloud is huge VM numbers and overlay network. In this case, the performance testing will be more complex, and the performance result will be different.
In this talk, we will cover following parts:
- Benchmark settings, following part will be considered in test settings: flow numbers, VM numbers, the packet receiving sequence pattern and VXLAN operations.
- Performance analysis based on perf and vTune
- Hardware limitation to reach rather high throughput
Next steps for higher performance of the software data plane in OVS (Harry Van Haaren, Intel)
Recent investigation into the software data path of OVS indicates that a number of optimizations can be implemented, leading to significantly higher software-only switching performance. This talk identifies work items that result in higher performance, and how we can implement those optimizations in OVS using methods that are easy to debug, validate, package and deploy.
Openvswitch packet processing optimization: a story on Arm architecture (Yanqin Wei, ARM)
Increasing demand of network throughput poses challenges for server and networking equipment. Arm is a popular architecture in embedded systems and is launching products for server, which can provide useful features for packet processing. This presentation introduces our ongoing work of performance optimization on Arm and the new Arm features that can benefit the data path.
- Arm has a weak memory model. Hardware re-ordering improves packet processing performance but it requires careful lock-free software implementation and correct ordering rules. The read-write concurrency library is a good place to take this advantage.
- Atomics support is enhanced in the newer generations of Arm architecture. It solves some data contended and memory bouncing issues in large systems.
- SIMD in Arm is better at parallel processing of small and contiguous data, which can improve some basic operators in OVS library. And the next generation of SIMD brings hope for parallel processing of large and non-contiguous data blocks. It can widen the scope to apply vector data processing.
Running OVS on Containers (Shivaram Mysore, Service Fractal)
Running OVS on commodity OS has now become very routine. With the explosion of container usage for applications and infrastructure, there is a need for dynamic networking. Other use cases include use of OVS at the Edge on bare metal, infrastructure development and testing. The immediate answer is to use OVS running on containers. This talk is about what it means to run OVS on containers and its challenges.
Containerize OVS/OVN components (Aliasgar Ginwala, eBay)
This talk will be a walk through for containerizing OVS/OVN components so that building, shipping and managing OVS/OVN is easy. A quick demo of how to build, push and start ovs/ovn as a container will be given.
Deploying multi chassis OVN using docker in docker (Numan Siddique, Red Hat)
This talk/demo will show how we can deploy multi node OVN setup as docker in docker containers with a simple script in a single machine. This kind of quick deployment helps the developers in quickly testing out OVN features.
The Gateway to the Cloud: OvS in a Layer-3 Routed Datacenter (Carl Baldwin and Jacob Cooper, DigitalOcean)
DigitalOcean is undergoing a major overhaul of its droplet network infrastructure. Until now, public IP traffic was carried among droplets and the internet over a large flat layer 2 data center (DC) network.
While an L2 network offers full mobility of IP addresses throughout an L2 zone, it presented some pains as DigitalOcean began to scale. The sheer volume of broadcast traffic from ARP requests alone makes for a very noisy network, with every hypervisor (HV) seeing every request. Additionally, tying subnets to L2 zones created IP mobility issues between zones and, potentially, DCs.
What is the solution? We chose to move to an L3-based infrastructure. This session will show our journey from L2 to L3 in detail. It will highlight how we leveraged OvS to make this change, using parallel active data paths and pivot points to switch between them. OvS continues to be a key element in the L3 network going forward.
We will share all of the pitfalls and innovations that got us to where we are along with all that we have left to do.
Those who attend this session can expect to learn about how DO droplets reach the internet at scale.
- How we are able to move droplet traffic on existing HVs from L2 to L3 without downtime using Open vSwitch.
- The advantages and disadvantages of using Open vSwitch on the HVs. The challenges of retrofitting it into a routed network and how we can potentially do better. What were the alternatives?
- How we built and scaled an L3 network using our existing network gear and HVs.
- Our custom-built, HV-local ARP and NDP responder and BGP route announcer.
OVN performance (Mark Michelson, Red Hat)
Last year at OVScon, I did a talk where I discussed OVN performance improvements, and focused mostly on past improvements. This year, I plan to do similarly, but I will also focus on pain points that we plan to tackle over the next year.
Dynamic disabling/enabling of EMC based on traffic pattern (Vishal Deep Ajmera, Nitin Katiyar, Venkatesan Pradeep, and Anju Thomas, Ericsson)
There are systems which regularly encounter traffic pattern such that there are constant evictions and insertions in the EMC cache. This EMC thrashing downgrades the system and prevents it from performing to its maximum potential. Even though there are options to manually disable EMC in OVS using CLI today, this pattern may be fluctuating and hence we might not want to disable this cache permanently, rather disable/enable based on traffic pattern dynamically.
The proposal is to identify EMC thrashing and to dynamically disable it and then to enable it back again when the situation has recovered. We measure the state of EMC at regular time intervals. If EMC thrashing is observed for multiple such time intervals for a given PMD, we will disable EMC for that PMD. Once EMC is disabled, we will do a dry run at regular time intervals to check if we can enable EMC back. If we find that dry run yields a positive result for consecutive intervals, we will enable EMC again. This feature can be enabled via a CLI for the system along with other configuration parameters like time interval, threshold values.
In public cloud data center, one to one NAT and SNAT are must-have features, tenant VMs need it to access Internet and provide public web service, Openstack has floating IP and network node to do them, respectively, but unfortunately Openstack can’t use public IP (i.e. EIP, Elastic IP) do them because of infrastructure limitation, Inspur Cloud used OVS to implement one-to-one NAT and SNAT, we also implemented NAT gateway cluster to achieve horizontal scalability and high availability, this talk will show you our implementation technical details and live demonstrate our NAT cluster.
OVS with AF_XDP, what to expect (Eelco Chaudron, Red Hat and William Tu, VMware)
This talk will go over the experience implementing and testing AF_XDP with OVS (natively, and using the DPDK AF_XDP PMD). Including steps forward, i.e. what else can we do to further optimize specific use cases.
Guest vlan tagging with OVN (Karthik Chandrashekar, Nutanix)
By default, OVN does not allow vlan tagged packets from virtual machines. This hinders the adoption of OVN to scenarios like nested hypervisors. In this talk we will explain how and why we leveraged upon “container inside VM” implementation in OVN to achieve guest vlan tagging feature.
OVN operationalization at scale at eBay (Aliasgar Ginwala, eBay)
This talk will be a walk-through about operational challenges faced for migrating legacy cloud overlay network workloads to OVN at eBay at scale. We will give a glimpse of improvements done working with the community for OVN control plane HA including migration/rollback experience from active-standby to raft cluster. We will also highlight some best practices that we follow when deploying/managing OVN at scale for each availability zone to support different use-cases with minimal downtime and impact both from control/data plane perspective. Experience/strategy about handling ovn/ovs upgrades for thousands of computes/gateway nodes will also be presented.
Magma: Building converged access networks using openvswitch at the edge to improve global connectivity. (Amar Padmanabhan, Facebook)
Today's cellular networks attempt to take a rigidly standardized set of network architectures and processes and apply them across all contexts. This approach is reaching its limits: even 2G network coverage has left behind 800M people, with 3G or better covering substantially lower percentages of the population. The technology problem of connecting the next billion is the problem of dealing with heterogeneity. Heterogeneity in deployment landscape, heterogeneity in access technologies (LTE/WiFi), features sets and in the user base. Thus, the right network architecture is one that is most flexible, easiest to scale (up/down), and makes the broadest use of backhaul technologies. In this talk we will cover our implementation of a converged access network that fully distributes the core network functionality by leveraging openvswitch at the edge to offer this flexibility. We would also like to briefly cover our interest in co-developing stateful and L3 gateway functions in OVS to better support our use case.