Open vSwitch and OVN 2022 Fall Conference
The Open vSwitch project, a Linux Foundation Collaborative Project, hosted its ninth annual conference focused on Open vSwitch and OVN on November 8-10, 2022, at Red Hat offices in Westford, Massachusetts. The event was held as a hybrid event.
|Day 1 Welcome / Opening Remarks
|Tools and Techniques to debug OVS Hardware Offload
|Increasing visibility with perf-flow IPFIX sampling in OVS/OVN
|Revalidator Tracepoint Implementation in Open vSwitch
|OVS-Kselftest: A new way to test the kernel module
|Easily deploying Kubernetes with OVN as CNI using Kind
|OVN Open Discussion Forum
Break through Bottlenecks of OVS and Virtio by Using Smart NIC and vDPA Offload
Speaker: Yi Yang, Inspur
As far as pps (packets per second) is concerned, OVS kernel is not good, OVS DPDK is much better and can save CPUs in-kernel vhost kthreads used, but it needs to occupy some cores exclusively to attain good pps, more rx/tx queues mean occupying more CPU cores, especially, it is not economic that so many cores are occupied when NIC speed is 25G or above, smart NIC (e.g. Mellanox Connect X 6Ln) can offload OVS forwarding plane and virtio (by vDPA offload) and CT (conntrack) partially, SF (sub-function, similar to VF, but can have 256 even more for one NIC) and vDPA can fix live migration issues in cloud very well, this is very promising solution for cloud, we have finished a PoC and found one 16 vCPU VM instance can reach more than 7 million pps, almost no CPU consumption is in compute node for OVS and virtio, it is really perfect performance data, compared to cloud providers (Alicoud and Huawei Cloud). Yi will talk about some bottlenecks of current OVS and how we break through them by smart NIC, Yi also will demonstrate our PoC and show our pps performance data.
Using OVN Interconnect for scaling (OVN) Kubernetes deployments
Speakers: Numan Siddique, Red Hat, Inc.; Dumitru Ceara, Red Hat, Inc.
OVN Interconnect is a feature supported by OVN to interconnect multiple OVN independent deployments. This talk will show - how the OVN's interconnect feature can be used by ovn-kubernetes to deploying a complete OVN stack (OVN databases, ovn-northd and ovn-controller) in each kubernetes worker node and interconnecting them - how this helps in scaling large deployments.
Tools and Techniques to debug OVS Hardware Offload
Speaker: Balazs Nemeth, Red Hat, Inc.
With the increasingly demanding network load in today's data centers, offloading network processing to the hardware has become an invaluable technique to free up CPU resources. One of the main building blocks is Linux Connection Tracking (CT) and Traffic Control (TC) which is designed to be offloaded to the hardware in a transparent way. The flip side of this transparency is that it may be difficult to assess if offload succeeds. Therefore, this talk outlines basic steps and troubleshooting techniques to determine if packets and flows are hitting the hardware, or if execution falls back to software paths. After a brief introduction to hardware offload, we'll describe the tools and commands that we've found to be useful to pinpoint issues. While these are mostly hardware agnostic, we'll discuss how they work in conjunction with a CX-5/6 smart-nic and a Bluefield-2 DPU.
Increasing visibility with perf-flow IPFIX sampling in OVS/OVN
Speaker: Adrian Moreno, Red Hat, Inc.
Visibility and debugability of the dataplane is a mayor challenge for many organizations using OVS/OVN. Traditional solutions such as IPFIX/Netflow flow sampling rely on observation points (e.g: switches and routers) sampling all packets that traverse them. This mechanism (supported by OVS as "per-bridge sampling") has several limitations such as ignoring the OVN's virtual datapaths, limited forwarding status, etc.
However, OVS does support another interesting sampling mechanism: "per-flow sampling" which, in coordination with OVN can yield much more fined-grained, context-rich visibility mechanism.
We would like to review how this sampling mechanism works and how it can be used with OVN for several interesting use cases: Drop sampling and ACL sampling.
Horizontal scaling with OVN component templates
Speaker: Dumitru Ceara, Red Hat, Inc.
As clusters grow in size, the centralized parts of OVN (North/Southbound databases and ovn-northd) tend to be under increasing load and tend to become the bottleneck in the distributed system. There have been various optimizations implemented over time in all the different central components that have helped out in keeping the resource usage under control. There is however a limit to such general approaches.
This talk presents an alternative, virtual network component templates, which with the help of the CMS can significantly reduce resource consumption in the OVN central components in specific cases. Sometimes network components are compute node-specific. Sometimes such components are replicated, almost identically, for multiple nodes in the cluster. With CMS help, these virtual network components can be represented more efficiently as a template that is later instantiated accordingly on each and every compute node.
We briefly describe the implementation of this solution, how the CMS needs to use it and we also present some results of various benchmarks performed with network component templates.
Do we need OVS-DPDK to be more deterministic?
Speaker: Eelco Chaudron, Red Hat, Inc.
During the last years, many topics were about increasing the throughput of OVS-DPDK. However, lately, we are getting signals that people also would like a bit more determinism. With determinism, in this case, they mean latency variations. Take the example where the OVS-DPDK PMD threads need to take a system call, which for a brief period stops packet processing. This talk will dive into the reasons for these forwarding interruptions and how we could potentially solve them, and at which costs.
MAC binding aging in OVN
Ales Musil, Red Hat, Inc.
The MAC binding table stores MAC address - IP pairs learned by the logical router pipeline. Without it the router would have to ARP every time traffic travels through the router pipeline. However the table can grow over time with every topology change, which has the most impact on large scale deployments. This talk elaborates about methods to prevent infinite growth of the MAC binding table by the mechanism called MAC binding aging.
Live migration: reducing downtime with OVN multi-chassis bindings
Speaker: Ihar Hrachyshka, Red Hat, Inc.
A number of improvements were introduced in OVN 22.06 to help CMS in reducing network downtime during a VM live migration, including: multi-chassis port bindings, packet cloning, and port activation strategies. These features combined allowed OpenStack to reduce observed downtime during live migration from multiple seconds to near instant (observed at or below 0.1 second threshold).
This talk will go through mechanics of these improvements and how they are utilized to achieve the observed improvements. It will then touch on potential limitations and next steps for the effort.
OVN Scale Testing Framework for High-Throughput Data Planes with Real Deployment Emulation
NOTE: This submission was rescinded by the authors.
Speakers: Aaron Smith, Red Hat, Inc., Karthik Sundaravel, Red Hat, Inc.
Current OVN scale testing projects do not focus on NFV/Telco environments (high-throughput) environments and rely on simulated test scenarios. We present two related projects that address throughput by allocating VF ports for the test data plane environment and enable the reverse engineering and emulation of real OVN deployments. One project advances current OVN scale testing concepts by adding an SR-IOV data path to allow higher-throughput during testing. The second project reverse engineers an actual OVN deployment and generates configuration information for the first project. Together, the projects allow the testing and debugging of real-world, high-performance OVN deployments. An example deployment will be presented along with test results.
Revalidator Tracepoint Implementation in Open vSwitch
Speaker: Kevin Sprague, UMass Lowell
Existing debugging infrastructure was great at telling us when flows were added to a datapath, but there was no infrastructure in place to tell us when flows were deleted, or more importantly, why they were deleted. This talk will discuss the process of implementing a User Statically-Defined Tracing (USDT) probe in the Open vSwitch revalidator to accomplish that, and it will explore some possible use-cases for this new USDT probe, including watching flows all the way from "birth" in the kernel to "death" in the revalidator, and possibly within the context of openshift.
Hardware Offload of Metering
Speaker: Simon Horman, Corigine, Inc.
A key function of a network is to ensure resources are available to meet user's needs. And in the presence of contention a tool that can be used to help ensure appropriate resource usage is QoS.
This presentation will look at recent enhancements to the hardware offload of OVS QoS facilities. Including recent work on metering.
The focus will be on hardware offload using TC in conjunction with the kernel datapath.
OVS DPDK optimization and customization for real public cloud scenarios
Speakers: Cheng Li, China Telecom, Corporation, Ding Han, China Telecom Corporation
At Chinatelecom cloud, OVS DPDK is heavily used to provide virtual network ability. We have made a lot of optimizations and new feature development to meet fast increasing users and sceniarios. In this presentation, we first introduce our user scenarios in both public cloud and private cloud. Then we will share our performance optimazations(i.e, per pmd meter, CT experiation batch update, tx steering optimazation) and implemented new features(i.e. flow based CT limitation).
Better exception path performance for OVS-DPDK
Speaker: Thilak Raj Surendra Babu, Nutanix, Inc.
In OVS-DPDK, the internal interface for the bridge is a tap interface handled by the ovs-vswitchd thread(non-pmd thread).
Performance measured through IPERF is lower than what we achieve out of kernel Datapath.
In this talk, I would like to talk about how adding the support for virtio-user as an exception path in OVS-DPDK helps improve the performance of socket-based applications on OVS-DPDK hosts.
Affinitizing guest flows on OVS-DPDK.
Speaker: Thilak Raj Surendra Babu, Nutanix, Inc.
Due to NUMA and other considerations, most real-world deployments will have more than one RXQ per interface being serviced by a PMD each on a different NUMA. This leads to different flows belonging to the same VM being serviced by two different PMD threads due to RSS which leads to spinlock contention towards the guest RX interface which results in sub-optimal performance.
In this talk, I would like to talk about my experiences and how steering the flows towards the right RXQ helps with performance.
Take 2: Action! Now with a focus on how SIMD benefits performance and "next-up" OVS 3.1 actions.
Speaker: Emma Finn, Intel, Inc.
OVS is a flexible vSwitch, allowing flexible modification of packets. As part of OvS 3.0, an AVX512 implementation of common Actions was merged upstream. This talk shares the learnings of implementing the AVX512 actions implementations, and how the SIMD ISA accelerates operations of modifications made to each packet. Future work in the Actions domain for the OVS 3.1 release will be described, leaving the audience with an understanding of current code as well as future actions.
Sharing OVN among kubernetes clusters
Speakers: Hareesh Puthalath, NVIDIA Corporation, Alin Serdean, NVIDIA Corporation
Ovn-kubernetes is a Kubernetes CNI using OVN and OVS to implement container networking. Current architecture of ovn-kubernetes uses a dedicated OVN instance per cluster. We want to make use of OVN in applications that span multiple clusters and data center environments with tenant users who consume kubernetes clusters. A shared OVN model is useful in such multi-cluster and multi-tenant cloud environments. If clusters are able to leverage a shared OVN instance, it will decouple the management of OVN central components and move high-availability and scale considerations away from tenants and end users to a separate layer. It will avoid exposing some of OVN central control plane components to the tenant clusters (and thereby prone to misconfigurations from tenants). It will also open new possibilities for inter cluster connectivity use cases via the overlay network itself.
In this talk, we will share our experience in using shared OVN in such a multi-tenant environment. We will go through our use cases and how we structured the infrastructure control plane components. Our infrastructure also makes use of DPUs to offload OVS networking and how it works in this scenario. We will also describe some of the challenges in using this approach and changes that are needed to make this approach more generic.
OfP4, a P4 front-end for Open vSwitch
Speaker: Ben Pfaff, VMWare, Inc.
Open vSwitch implements only OpenFlow. There are a limited number of software P4 implementations, each of which has its own limitations. There are currently no software P4 implementations that are fast, well maintained, and support P4Runtime. This opens up the possibility for a new P4 software implementation that has all of these characteristics.
This talk is about OfP4, an experimental software implementation of P4 that uses Open vSwitch as the dataplane. To use OfP4, the user compiles P4 into an intermediate form using a special OfP4 backend to the P4 compiler, converts that intermediate form into Rust, and then compiles the generated Rust along with the rest of OfP4 (also written in Rust) and runs the binary. The OfP4 binary acts as a gateway: it accepts P4Runtime commands on its front-end and converts them into OpenFlow flows on its back-end.
This talk will outline the compilation and translation processes and explain the current and future likely features and limitations of this approach to P4 support in Open vSwitch.
A version of this talk was given at the P4 Workshop earlier this year. Whereas that talk emphasized the P4 aspects of OfP4, this version will emphasize the Open vSwitch and OpenFlow aspects. The OfP4 implementation has also advanced since that talk.
JetStream: Automatic Optimization of Virtual Switch Rulesets
Speakers: Hugo Sadok, Carnegie Mellon University, Margarida Ferreira, Carnegie Mellon University
Users of virtual switches, such as OvS, need to deal with complex rulesets to define their desired networking policies. These rulesets can be a combination of automatically generated rules as well as rules crafted by different people in the same organizations. As a result, redundancies and errors are common. Ruleset redundancies are problematic in production because larger rulesets introduce memory pressure and larger datastructures, causing performance slowdowns in packet classification. In this talk we will present JetStream, a new tool to help operators simplify, restructure, and check equivalency of virtual switch rulesets. JetStream can automatically find redundancies and unreachable rules and is able to restructure rulesets to reduce the number of tables --- potentially speeding up packet classification. In addition, JetStream lets operators check equivalency between different rulesets, allowing them to also apply manual optimizations. At the core of JetStream is an SMT formulation that can model the entire ruleset. We use this model not only to find redundancies but also to prove that any manual or automatic transformations are correct. Our analysis of production rulesets shows that JetStream is able to significantly reduce the number of rules and can check equivalency of rulesets with hundreds of thousands of rules.
OVSDB: A database to configure your database?
Speaker: Ilya Maximets, Red Hat, Inc.
There are few things that users may want to tweak about the OVSDB server itself, especially in large OVN deployments. These are IPs to listen on, inactivity probes, databases to backup or relay, etc. And there are different ways to configure the ovsdb-server process such as command line arguments, ovs-appctl commands or getting the configuration from one of the served databases. However, there are cases where none of these configuration methods is sufficient.
This talk will highlight strong and weak sides of each approach and present possible solutions including a new Local_Config database.
OVSDB performance updates '22 / testing with ovn-heater.
Speaker: Ilya Maximets, Red Hat, Inc.
OVSDB got a few exciting performance improvements over the past year. This includes moving most of the database compaction work to a separate thread and overall reduced CPU and memory usage. Some of this work is a direct result from identifying scalability issues in ovn-heater runs.
As a testing tool, ovn-heater itself evolved to be faster and collect more data.
This talk is an overview of these changes in both projects and how they influence development of each other.
Classification optimizations to enable megaflow offload onto lightweight HW pipelines.
Speaker: James Choi, Intel, Inc.
Despite broad desire for HW offloading of megaflows, HW offloaded megaflow still hasn't been broadly deployed, especially with the HW pipelines with limited capabilities. Conceptually, the megaflows can be offloaded with TCAM or exact match tables. Using TCAM though requires wide TCAM slices due to megaflow accommodating the many tuples in the tuple space search. Many tuple combination also results in large number of megaflow entries, which requires long TCAM slices. Using exact match hash tables in HW, similar to the SW dataplane in dpcls, requires a HW capability to dynamically accommodate new hash tables with the SW defined key fields, while searching all hash tables in parallel.
We believe these difficulties of HW offloading partly are due to the megaflows being structured to accommodate primarily SW pipelines, which can use large memory resources and flexible lookup algorithms to speed up lookup. We propose an algorithm to maintaining additional data structures to help target offloading to lightweight HW pipelines by segmenting tuple space table into smaller granularity for narrower and shorter TCAM usage, to be used in conjunction with a small exact match HW table. We believe these optimizations are well suited for lightweight ASIC or FPGA based HW pipelines.
Running OVS with a P4 coprocessor
Speaker: Dan Daly (Intel, Inc)
In this talk we describe a minimal set of patches to OVS to enable the use of a P4 coprocessor resident on the same system. The P4 coprocessor supports the functionality of OVS configured through OVSDB and OpenFlow along with additional P4 programs running on the same networking dataplane. P4-based applications that this model supports include an implementation of Kubernetes networking and an implementation of IPsec encryption of physical and virtual ports. With P4 introduced as a coprocessor, new networking applications and concepts can be realized without disrupting what already runs in existing systems. We will also demonstrate P4 coprocessors implemented in optimized software & hardware that enable P4 programming of the network infrastructure with high performance, capacity & scale.
P4-OVS Split Architecture patches and code
Speakers: Derek Foster, Intel Inc., Nimrata Limaye, Intel Inc.
Two years ago, we presented an integrated P4-OVS solution which added P4 capabilities to OVS. Since then, we have refined the solution into a modular split architecture model. This architecture uses OVS as a virtual switch that integrates with P4 components using a minimal set of changes to OVS. We would like to briefly describe the architecture and components, show the patches that we will submit to OVS, and all show all open-sourced code and components that are used to build split version of P4-OVS.
OVS-Kselftest: A new way to test the kernel module
Speaker: Aaron Conole, Red Hat, Inc.
Back in 2011, the Open vSwitch project pushed the openvswitch kernel module to the Linux kernel. Since that point, the upstream kernel team has been charged with part of the maintenance of the kernel module. In order to ensure that changes to the module don't cause regressions, generally a developer must install the ovs userspace components and then run the kmod testsuite. This process forces additional burden onto kernel maintainers, and developers. Additionally, one difficulty that repeatedly surfaces with this model - ensuring all of the userspace test suite requirements are met by the hosting machine, and wading through the giant test suite to find out how to test the kernel module changes.
To wit, we introduce a new utility, 'ovs-dpctl.py' which can program the netlink datapath, provide upcall endpoints, and do introspection of the kernel module state. With this utility, we provide a shell script that can be run as part of the kernel self test suite with the hope that future work on the kernel module can be free from regressions, and showcase the various configurations and flow setups. We even propose some testing that ovs-vswitchd userspace itself cannot introduce (such as invalid netlink messages, etc).
OVS-DPDK Shared mempool improvements
Speaker: Kevin Traynor, Red Hat, Inc.
User config mempools allows users with ports of different MTUs to consolidate sharing mempools to a single or set number or mempools. In order to do this, the user gives a hint about what the largest MTU will be. This prevents using multiple mempools when there are ports of different MTUs.
Delayed vhost mempool creation prevents creating a mempool for a vhost port until the vhost device is added and correct NUMA and MTU information is known. Previously, where a best-guess method was used before the vhost device was added, an uneeded mempool may have been created. The new scheme also reduces some false positive error conditions, simplifying debug.
ovs/ovn and OCP traffic encryption/decryption with IPsec
Speaker: Mohammad Heib, Red Hat, Inc.
K8s/OCP with OVN-K/OVN CNI uses the OVS tunnels to transfer packets from one machine to another. Along the path, the packets are processed by physical routers and physical switches. There are risks that these physical devices might read or write the contents of the tunnel packets. I plan to discuss how K8s/OCP utilize the functionality of IPsec that are implemented in OVS/OVN to prevent the malicious party from sniffing or manipulating the tunnel traffic, how IPsec is implemented inside the OVS/OVN stack, and how it utilizes the Linux kernel xfrm framework to implement traffic encryption/decryption.
Building NVIDIA GeForce Now network infrastructure using Open-source OVN/OVS and Nvidia commodity DPUs
Speaker: Majd Dibbiny, NVIDIA Corporation
Learn how to optimize Networking Performance, Functionality, Security and Cost for a multi-tenant Cloud, using NVIDIA Data Processing Unit (DPUs). Modern applications like Nvidia GeForce Now (Cloud Gaming) have stringent Performance (Latency, Jitter etc), Security and other requirements, while being also sensitive to the Cloud Infrastructure cost. As a result, Cloud Networking needs to combine traditional benefits of Software Defined Networking (SDN) with minimal use of CPU and other system resources. This session describes how such seemingly incompatible goals are being achieved by offloading both data and control plane of Bare Metal and/or Kubernetes SDN to Nvidia DPUsthrough a real life use case of NVIDIA GeForce Now system. Examples include acceleration of both traditional networking services like Stateful Firewall, as well as some application-specific networking functions.
Chair: Aaron Conole, Red Hat, Inc.
Easily deploying Kubernetes with OVN as CNI using Kind
Chair: Flavio Fernandes, Red Hat, Inc.
OVN Open Discussion Forum
Chair: Mark Michelson, Red Hat, Inc
(Virtual) Co-Chair: Frode Nordahl, Canonical, Inc.
Chris Wright, Chief Technology Officer and Senior Vice President, Global Engineering, Red Hat, Inc
Chris Wright will join us to present a live, remote, keynote.