Open vSwitch and OVN 2025 Fall Conference
The Open vSwitch project, a Linux Foundation Collaborative Project, will host its eleventh annual conference focused on Open vSwitch and OVN on November 19 - 20, 2025. The conference will be hybrid. Talks will be presented live during the conference. We are planning to use an online system that allows for text-based discussion, and live Q&A while the talk is happening for virtual attendees. Each talk will have time reserved following the presentation for further discussion via both text, over video and audio, and live with the presenters. Each talk will be given once.
The in-person location for the conference this year will be at the Hotel Botanique (here) in Prague, CZ.
How to attend
Registration will be required to attend. See the event brite page to register. We offer complementary registration to speakers, students, academics, and anyone for whom the registration fee is burdensome. Please contact us if you need any help obtaining a complementary registration.
To book at the Hotel Botanique, you may use this link or use the group ID 4816305 when booking.
Talks / Schedule
We will present talks live at the Hotel Botanique and broadcast via Google meet.
Enter your local offset from UTC below to see the session times in your own timezone.
Day 1
Day 2
Abstracts
Observing Open vSwitch with native prometheus metrics |TOP|
Speaker(s): Gaetan Rivet, NVIDIA
Open vSwitch powers a diverse range of virtualized networking stacks, and observability remains a challenge. Existing external tools often assume a kernel datapath, introducing gaps when moving to userspace implementations. They also impose overhead by scraping internals via multiple CLI calls, which narrows what can be observed and limits measurement frequency. This talk introduces a native, Prometheus-compatible metrics framework in the OVS library. It aligns observability across kernel and userspace datapaths from an external observer perspective, providing a single access path to relevant events. The result is more accurate, granular visibility with lower footprint, enabling higher frequency measurements without distorting datapath behavior and reducing friction when integrating controllers. I will describe the Prometheus data model and the metric types implemented in the framework, then discuss trade-offs and design choices for the metric arborescence and access patterns: * Metric node types * Bounded collections for label generation * Conditional access to optional modules * Threading model Finally, I will show usage in practice with metrics defined in the common ofproto layer and specialized metrics made possible in the userspace datapath.
Enhancing hardware offload insights in OVS |TOP|
Speaker(s): Nupur Uttarwar, NVIDIA
In Open vSwitch, troubleshooting rule offloading often requires digging
through logs and cross-referencing system components, especially when
offload fails or is unsupported. This short talk introduces enhancements
to the “dpctl dump-flows” command that provide explicit per-rule failure
if any directly in the flow dump output. With these improvements, users
gain streamlined troubleshooting and clearer insight into the offloading
pipeline, accelerating root-cause analysis and system fine-tuning.
This talk will cover the following:
* Motivation for improving flow offload visibility and transparency
* Overview of new enhancements: error detection, propagation, and
exposure in the data path
Finally, I will show the usage in practice where the offload fails and
how users can benefit from immediate, actionable feedback on offload
failures.
AI-Powered Performance Insights: Integrating OVS/OVN automatic performance regression analysis with LLMs |TOP|
Speaker(s): Joe Talerico, Red Hat; Jose Castillo Lema, Red Hat; Mohit Sheth, Red Hat
Performance engineering is critical for ensuring Kubernetes networking scalability and stability, yet traditional regression analysis is slow and manual. At Red Hat, performance testing has shifted left, allowing developers to run scale workloads against pull requests early in the cycle. To accelerate feedback, we introduce Orion, a Python library for automated statistical regression detection, and Orion-MCP, an MCP server that enables Large Language Models (LLMs) to analyze results, detect anomalies, and provide natural-language insights. Together, these tools make performance data more accessible and actionable, helping developers quickly identify and address regressions in evolving Kubernetes networking code.
OpenFlow Classifier: Arcane knowledge and common pitfalls |TOP|
Speaker(s): Ilya Maximets, Red Hat
The OpenFlow classifier is the brain and the heart of Open vSwitch, but not many people these days know how it actually works. Some of the knowledge is hidden behind the complex code, motivation for some features and optimizations is lost in time, since the original authors of this code stepped away from the project long ago. This talk is a deep dive into inner workings of the classifier, aiming to shed some light on the architecture and optimizations, explaining common "do"s and "don't"s of writing efficient OpenFlow pipelines with regards to issues that OVN faced in the recent years.
Uncovering Testing Gaps in OVN Incremental Processing |TOP|
Speaker(s): Jacob Tanenbaum, Red Hat
The incremental processing of OVN objects greatly decreases the processing time of changes to the OVN database, this leads to increases in scale and ability to handle database changes. Assuming that all the changes are handled correctly. The current state of testing for the incremental processor has a large oversight that the project needs to address. Comparing the output of the incremental processor to a forced recalculation of the database leaves room for bugs and unforeseen interactions that may not be accounted for. We need a new method of testing that can account for interactions between runs of the incremental processor that are not currently accounted for.
Improving Retrieval from the OVS Hashmap |TOP|
Speaker(s): Rosemarie O'Riorden, Red Hat
The OVS hashmap is used all throughout OVN and OVS for a variety of data storage. With very frequent data retrieval, any bit of performance improvement is beneficial. Valkey recently published a blog about changes they made to their hashmap, and how they saw significantly less memory usage, so I want to build off their design and implement something similar for OVS. This will hopefully lessen cache misses, thus improving overall performance.
The BGP Fork in the Road: A Tale of Two Implementations |TOP|
Speaker(s): Jaime Caamaño Ruiz, Red Hat; Patryk Diak, Red Hat; Surya Seetharaman, Red Hat
In the world of networking, two paths diverged in a yellow wood, and OVN-Kubernetes took the one less traveled by. When it comes to BGP support, both OVN and OVN-Kubernetes developed their own implementations almost simultaneously. It was a classic "great minds think alike" situation, but instead of joining forces, they each went their own way. Now we're left with a tale of two distinct BGP implementations, each with its own quirks and charms. This talk will pull back the curtain on this intriguing parallel development. We'll start by exploring how OVN handles BGP, from its core design to its capabilities. Then, we'll dive into the OVN-Kubernetes approach, revealing how it provides BGP support for your workloads. We'll highlight the key differences between these two implementations and, more importantly, identify the gaps that currently prevent OVN-Kubernetes from simply adopting OVN's native BGP support. Finally, we'll get down to brass tacks: what would it take for OVN-Kubernetes to make the switch? We'll analyze the pros and cons of such a move, discussing the technical challenges and potential benefits. This talk isn't just about code; it's about the journey of open source development and the sometimes-messy reality of building great things in parallel. Join us for a fun and informative look at two BGPs, and help us decide if it's time for them to finally become one.
A Nightmare on Kube Street: Slicing Kubernetes Networks like Freddy Krueger |TOP|
Speaker(s): Dave Tucker, Red Hat; Surya Seetharaman, Red Hat
Virtual Private Clouds (VPCs) are the network isolation dream that major cloud providers have perfected—but for those of us running Kubernetes, our networks can feel more like a nightmare. Despite recent advances in network policies and segmentation, we're still stuck in a flat network hellscape where true multi-tenant isolation feels impossibly complex. What if we could slice through this chaos like Freddy himself? What if we could give teams the power to carve out their own isolated network territories while still allowing controlled communication between them? In this talk, we'll unveil a new OVN-Kubernetes feature that brings true VPC functionality to your clusters. Building on the recently added User-Defined Networks (UDNs) foundation—which provides isolated network primitives—our VPC implementation allows you to group these networks together and define communication policies between them. Think of it as NetworkPolicy's bigger, more powerful sibling that operates at the network topology level rather than just the pod level. We'll start with a tour of VPC concepts from AWS and NSX, then dive deep into our OVN implementation. You'll see how we leverage OVN logical switches, routers, and policies to create true network segmentation that goes beyond what traditional Kubernetes networking offers. We'll demonstrate our working prototype in action, showing how development teams can own their network topology while platform teams maintain control over inter-network communication. Join us as we slice through the technical implementation details and show you how OVS and OVN make this VPC nightmare-turned-dream a reality. By the end, you'll understand the architecture, see it working, and know how this feature will transform multi-tenant Kubernetes networking. Sweet dreams are made of properly isolated networks!
OVN and BGP: A Friendship Forged in OpenStack Neutron |TOP|
Speaker(s): Jakub Libosvar, Red Hat
This talk explores the integration of OVN's BGP capabilities within OpenStack Neutron, the networking component of the OpenStack project. We will begin with a concise overview of key Neutron components to provide essential context for attendees unfamiliar with OpenStack. Following this, we will break down the OVN logical components responsible for advertising prefixes throughout the topology. The session will also feature a short live demonstration showing the OVN BGP integration with Neutron in action. The goal of this session is to present how OVN innovations are adopted by other projects.
Accelerating OVN Dynamic Routing: OVN-BGP Agent + OVS-DPDK Fast Data Path Performance on RHOSO 18 |TOP|
Speaker(s): Pradipta Sahoo, Red Hat; Haresh Khandelwal, Red Hat
Achieving both dynamic routing flexibility and high-speed packet forwarding remains a central challenge in cloud networking. In this talk, we present performance results from a proof-of-concept integration of OVN-BGP-Agent with the OVS-DPDK fast data path on Red Hat OpenStack Services on OpenShift (RHOSO) 18. Our evaluation on high-capacity hardware with dual 100 Gbps NICs shows: - Line-rate throughput at 100 Gbps aggregate with large frames and negligible packet loss. - 12+ hours of sustained stability under continuous small-frame traffic, with no BGP session flaps. - Key scalability constraints for small packets, stemming from single Rx queue usage, flow setup overhead, and MAC binding limits in the datapath. We will examine these trade-offs between routing agility and forwarding performance, and share tuning strategies and architectural considerations to advance OVN dynamic routing toward production-grade readiness. Attendees will gain actionable insights into optimizing OVS/OVN for demanding, real-world dynamic routing deployments.
A native way of integrating OVN into the fabric through BGP-EVPN. |TOP|
Speaker(s): Ales Musil, Red Hat; Dumitru Ceara, Red Hat
Since version 25.03, OVN has had native support for integrating with L3 fabrics through BGP. We now present the work that happened in 25.09 to extend this native support in order to stretch OVN virtual L2 domains into and across the fabric through BGP-EVPN. The talk will give an overview of the targeted use cases for BGP-EVPN in cloud deployments and of how OVN virtual L2 domains are implemented. We'll then show how the new OVN native support can be used for connecting the two. The newly added OVN BGP-EVPN support is marked as experimental in 25.09 therefore we're also seeking feedback from the audience regarding the design decisions we have made when implementing the feature. In order to facilitate that we'll go into the technical details of the solution and present: - how and where the BGP control plane is supposed to be configured and is expected to run - how OVN integrates with the BGP control plane in order to learn remote VTEPs (VXLAN tunnel endpoints) - how OVN programs its dataplane in order to forward traffic through remote VTEPs according to the BGP control plane information - how OVN learns and exposes remote and local workloads through EVPN The talk will also present the plan to extend the implementation in order to further support more ways to integrate with the fabric BGP-EVPN infrastructure.
Virtual Network Function integration in OVN |TOP|
Speaker(s): Sragdhara Datta Chaudhuri, Nutanix; Naveen Yerramneni, Nutanix
This feature addresses a long-standing request in OVN by adding the ability to insert a virtual network function (VNF), such as a third-party firewall, into the traffic path. It supports both VLAN and overlay subnets and follows a "bump-in-the-wire" model. Packets matching an access list are forwarded through the port pair of a VNF. Stateful behavior is maintained by forwarding response packets through the same port pair. For cross-host communication, tunneling is enforced for traffic to and from the node hosting the VNF. Users can group multiple instances of a VNF together. The health of the group members is monitored using data path probe packets. Users have the option to either select one healthy member or load balance across all healthy members. Additionally, there is a configuration option to choose between fail-close and fail-open behavior, which dictates whether traffic will be dropped or forwarded (bypassing the VNF) when all group members are unhealthy. In addition to the inline mode VNF described above, users can alternatively create a VNF in tap mode. In this mode, packets matching an access list — and their associated responses—are cloned and forwarded to the VNF, while the original traffic continues along the normal forwarding path. The solution introduces new pipeline stages in logical switch, new northbound (NB) entities related to network function, a new action in access-list and a new type of health monitoring in ovn-controller. It also makes use of connection tracking labels to support traffic redirection for response packets and to enforce tunneling behaviour. Current status of the feature: Code review in progress - https://mail.openvswitch.org/pipermail/ovs-dev/2025-July/424718.html https://mail.openvswitch.org/pipermail/ovs-dev/2025-July/424719.html https://mail.openvswitch.org/pipermail/ovs-dev/2025-July/424720.html https://mail.openvswitch.org/pipermail/ovs-dev/2025-July/424721.html https://mail.openvswitch.org/pipermail/ovs-dev/2025-July/424722.html https://mail.openvswitch.org/pipermail/ovs-dev/2025-July/424723.html
OVN-Kubernetes Meets DPUs: Enabling Service Function Chaining and Shared OVN |TOP|
Speaker(s): Tim Rozet, NVIDIA
Modern cloud-native environments are increasingly adopting DPUs and advanced networking architectures to deliver high-performance, secure, and flexible services. This talk will explore how OVN-Kubernetes integrates with DPUs, and how this integration is driving new feature requirements in OVN itself. One such feature, "Shared OVN", enables a single OVN instance to serve multiple control management systems (CMS), one per DPU and one for the host cluster. We will discuss why this capability is critical for the DPU use case, and how it is designed to provide isolation and scalability. Another requirement, Service Function Chaining (SFC), introduces the ability to steer traffic through a sequence of security, auditing, and accounting functions on the DPU before it leaves the host cluster. We will cover the expected architecture for SFC in OVN, along with the current status of its enhancement proposal. Attendees will gain insight into the DPU use case with OVN-Kubernetes, and new features required in OVN. This talk also serves as a call to action to help move these features forward.
Transit router - The distributed router for OVN interconnect. |TOP|
Speaker(s): Ales Musil, Red Hat; Mairtin O'Loingsigh, Red Hat
OVN interconnect is a way to connect multiple OVN deployments in a managed way without the need to setup the tunnels manually. So far there has only been support for Transit Switch which allows the Transit Switch Ports to be part of different Availability Zones. However, there wasn't any simple way to setup routing across Availability Zones. The Transit Router aims to solve that issue. The main goal of Transit Router is to have distributed router that has ports across Availability Zones to leverage the Logical Router capabilities in ovn-ic deployment. The aim of this talk is to introduce the Transit Router capabilities and limitations. With a focus on the internal details and the configuration needed to have a working OVN IC deployment that can leverage the Transit Router. At the end we would like to discuss some open-ended questions for potential improvements.
One API to Rule Them All: Reworking the OVS Hardware Offload Layers |TOP|
Speaker(s): Eelco Chaudron, Red Hat
Open vSwitch (OVS) currently supports hardware offload through implementations that are tightly coupled to the dpif-provider layer. While this approach works, it has led to a relatively rigid model where each provider integrates offload capabilities in its own way, making it harder to extend and share functionality across vendors. Today, OVS supports multiple backends for offload: the kernel datapath leverages the TC flower classifier, while the userspace datapath integrates with DPDK using rte_flow. Both paths expose their functionality through the netdev-offload APIs, but the current design results in duplicated logic and limits opportunities for consistency and reuse. For example, vendor-specific capabilities must often be reimplemented in each provider, and adding new offload features requires invasive changes in multiple places. This talk will introduce a proposal for a new dpif-offload API layer in OVS. The goal is to provide a unified and more flexible abstraction for hardware offload that sits above the provider-specific implementations. By centralizing offload handling, we can simplify vendor integration, reduce duplication, and allow for more extensible offload pipelines. Ultimately, this design aims to make it easier for vendors to plug their hardware-specific implementations, while giving operators a more consistent and maintainable offload experience.
OVS-DOCA live-upgrade |TOP|
Speaker(s): Eli Britstein, NVIDIA
In a production environment it is required to update OVS daemon version, mainly for bug fixes.
This update procedure requires to have minimal effects both for down-time and workloads interruption.
Today, this requires a full restart of the daemon, which have mainly the
following challenges:
1. Ports initialization, especially with offloads.
2. There are many volatile states that are gone and must be re-learned or
re-configured by a controller.
a. Openflow.
i. Meters.
ii. Groups.
iii. TLV-map.
iv. Rules.
b. Connection tracking connections.
c. FDB (MAC learning).
d. User configured routes.
e. Tunnel neighbour resolution tables.
f. More...
3. Connection tracking TCP sessions are disconnected and must be re-established.
This talk will go through how OVS-DOCA achieves that.
DOCA is capable to initialize the same ports from different userspace processes
and zero-time handover.
The concept in OVS is to enable invoking the new version in client mode, in
parallel to the existing running version (the server). While two instances are
running, the following is done:
1. The server continues to handle the current workloads.
2. New instances (ports, offloads) initialization is done in parallel.
3. States are migrated to the new instance.
4. Handover.
5. Termination of the server instance.
We’ll also discuss version and resource constraints, failure handling and
rollback.
OVS-DOCA upstream roadmap |TOP|
Speaker(s): Maor Dickman, NVIDIA
Modern cloud workloads demand extremely high bandwidth and minimal latency. To
meet these requirements, it is crucial for Open vSwitch (OVS) to utilize
hardware acceleration with specialized devices, which enhances scalability,
lowers CPU usage, and enables efficient processing at speeds reaching hundreds
of gigabits per second.
OVS-DOCA, an advanced downstream version of OVS available from NVIDIA, delivers
top-tier acceleration on NVIDIA network interface cards (NICs) through a
userspace implementation and supports extensive offload capabilities, achieving
exceptional packet-per-second (PPS) and connection-per-second (CPS) performance.
This presentation will address the architectural differences between OVS-DOCA
and the upstream OVS solution and will cover the plans to upstream OVS-DOCA
acceleration, building on the existing dpif-netdev userspace framework.
The updated architecture involves enhancing dpif-netdev in the following ways:
1. netdev_class doca_class (netdev_doca) - New netdev
* Responsibilities: Port initialization and SW handling based on
DOCA-ETH (slow path)
2. dpif-offload-doca - New dpif-offload-provider using on DOCA-FLOW, based
on new dpif-provider API.
* Responsibilities: offload megaflows.
3. CT offloads DOCA offload provider - New offload API for userspace
connection tracking based DOCA connection tracking offload.
Baremetal network isolation for cloud providers with OVN on DPUs |TOP|
Speaker(s): Adrian Chiris, NVIDIA; Mael Kimmerlin, NVIDIA; Tarek Abu-Hariri, NVIDIA
Bare-metal nodes in cloud platforms are increasingly used for workloads demanding direct hardware access paired with high performance, yet these nodes present acute challenges for secure multi-tenancy and strong network isolation. Traditional isolation approaches for Virtual Private Clouds (VPC) often rely on switch configuration management, which introduces operational complexity, exposure to configuration errors, and risks of side-channel or unauthorized cross-tenant communication. As networks scale or tenants rotate frequently, these methods quickly become bottlenecks, reducing agility and potentially compromising security. We present a network isolation architecture for VPC, that leverages the NVIDIA BlueField-3 DPU to offload Open Virtual Network (OVN) and Open vSwitch (OVS) dataplane functionalities. In this model, the DPU, under exclusive control of the cloud administrator, enforces strict network isolation for tenants in a way that is entirely transparent to the host and requires no switch-level modifications when tenants are added or reassigned. With DPUs running in a secure domain, separate from the host operating system, all network traffic enforcement, including tenant segmentation, policy enforcement, packet inspection, and line-rate forwarding is handled transparently and securely. The DPU administrator, not the node tenant or host OS, manages all network isolation mechanisms, ensuring tenant boundaries cannot be compromised even by privileged host processes. Since the DPU accelerates OVS and OVN, the cloud administrator can offer all the expected features of a VPC (external connectivity, DHCP, floating IP addresses etc.) at line-rate speed, with maximum bandwidth and minimum latency. Isolation policies and network topologies are exposed through a declarative Kubernetes API, allowing operators to map bare-metal machines to tenants dynamically without operational complexity. The solution implements a specialized OVN topology optimized for high-performance bare-metal communication, sustaining line-rate throughput both between nodes and for external communications. This approach combines the flexibility of software-defined networking with the performance of hardware offload, delivering scalable, secure, and operationally lightweight tenant isolation for bare-metal cloud environments.
Deprecating Code |TOP|
Speaker(s): Simon Horman, Red Hat
OVS is a long running project. In time it has accumulated many features contributed by a large number of developers. However, as time has moved on, some features have become unused, while their maintenance burden remains. This presentation aims to explore options for deprecating such code. As well as suggesting some code to which this may apply. It is intended to be interactive, soliciting input from the audience on an appropriate way forward.
Revisiting checksum offloads in OVS |TOP|
Speaker(s): David Marchand, Red Hat
OVS checksum offloads API and TSO support for the userspace datapath was introduced in multiple steps over several years and has grown more complex with the addition of tunnel support recently. The close ties with the DPDK checksum offloads API resulted in complex operations in OVS packet processing. And here complex means bugs were present. This talk will present the recent revamp of this API: from the initial fixes and the unit tests coverage additions, to the new API, its enhanced readbility and the (hopefully) easier maintenance such as better isolation of the API from the DPDK and Linux kernel offloads API. As part of this talk, performance numbers with some popular nics will be discussed.
You Want to Upgrade WHAT? A Field Guide to Risky Network Changes |TOP|
Speaker(s): Nadia Pinaeva, NVIDIA; Dumitru Ceara, Red Hat; Patryk Diak, Red Hat
Every active project accumulates technical debt. For OVN-Kubernetes, our Layer2 network topology, a result of incremental additions and unfortunate design choices, had hit its limit. Not only did it make different topologies support more complicated than it should be, it also imposed feature limitations for EgressIP, Virtual Machines live migration and Network Interconnect. A re-architecture was unavoidable. This session details how we leveraged the OVN Transit Router to redesign our L2 network from the ground up for stability and scale. But the real story isn’t just the new design: having customers already using this feature in production introduced an interesting set of network engineering challenges to solve. If you have never: * Connected the same switch and router with 2 links * Copied the same MAC on multiple ports * Generated dummy IPs just to fix routing * Hand-crafted Route Advertisements to remove stale routes Join our session and follow along on our topology upgrade adventure!
Space - building a CMS with OVN Concepts |TOP|
Speaker(s): Felix Huettner, StackIt
When our existing OpenStack Neutron deployment could no longer meet new requirements, we decided to build an additional CMS based on OVN design concepts. In this talk we will share how we adapted these concepts to our implementation of "spaced". We will walk through the core components of our system: a new "spacebound" OVSDB for defining resources and "spaced" that acts as a custom northd. Spaced translates our high-level requirements into the standard OVN northbound database. It thereby integrates with the Resources that OpenStack Neutron is still managing in northbound. If you were hoping that this talk is actually about running OVN in space, then sorry for making you excited.
Programming OVS bridges using OVN Bridge Controller |TOP|
Speaker(s): Numan Siddique, Crusoe AI
If an external controller wants to program an OVS bridge, it has to either talk OpenFlow protocol or make use of ovs-ofctl utility. With the proposed new service OVN bridge controller, a user can program an OVS bridge using the OVN logical flow syntax. This new service will convert these logical flows to Open Flow rules and program the OVS bridges just like ovn-controller programs br-int. This talk gives an overview of how to use this service and shows how it becomes easier to program the OVS bridges.
Beam Me Through the Datapath: VDUSE for OpenShift Virtualization |TOP|
Speaker(s): Jakob Meng, Red Hat; Maxime Coquelin, Red Hat
Last year, we replaced the kernel-based networking stack in OpenShift – Red Hat's Kubernetes distribution for hybrid clusters – with OVS-DPDK and VDUSE: Traffic between NICs and containers is handled end-to-end entirely in userspace. This year, we are closing the gap for virtual machines: We extended OVN-Kubernetes and KubeVirt to use userspace datapaths exclusively with VDUSE and vhost-vDPA. In our talk, we propose a new network architecture for KubeVirt VMs, provide a deep dive into OVS-DPDK performance tuning in OpenShift, and once again share preliminary but transparent benchmark results.
OvS and Socket Maps - Beyond ports as endpoints |TOP|
Speaker(s): Aaron Conole, Red Hat
Open vSwitch provides an implementation of a layer 2 switch, with some awareness of both layer 3 and layer 4 networking details. This awareness can be considered a violation of concerns, but with the trade off that Open vSwitch can provide a greater flexibility when authoring network policies for end users. Here we propose a new layer violation - using application ports as packet endpoints. Existing kernel datapath accelerations, ie XDP, can be made aware of application details for endpoints. This, combined with a socket enqueue support, allows for much higher throughput by skipping those layer processing details that aren't needed (for example, routing in layer 3, and the subsequent additional layer 3 + 4 processing inside a target netns). By focusing on the packet movement rather than strict layer adherence, we show that Open vSwitch can provide an accelerated packet forwarding path, and can do so in cooperation with the existing forwarding path.
User space connection tracking improvements |TOP|
Speaker(s): Thilak Raj Surendra Babu, ZScaler Inc.
OVS connection tracking is a fundamental piece in the Dataplane fabric that we use. While it has improved in performance from OVS 3.0, It lacks the features that the kernel connection tracking provides such as CT logging, Counters, and Sync. I would like to talk about my experiences and our approach in solving some of these.
Multiple NICs as Encapsulation Endpoints for Large Scale GPU Cloud |TOP|
Speaker(s): Girish Moodalbail, NVIDIA; Han Zhou, NVIDIA
Multi-node AI training and inference is becoming increasingly common due to the growing size of LLMs. Within a single node, GPUs communicate through high-speed interconnects such as PCIe and NVLINK. However, communication across multiple nodes still heavily relies on Ethernet. Each GPU in a node is equipped with its own dedicated NIC, enabling direct communication with GPUs in other nodes. For example, if a node contains four GPUs, it will also have four NICs. This means AI workloads are multi-homed, and all inter-node GPU overlay traffic will be transported over the Encapsulation IP associated with each respective NIC. This means we end up with multiple-encapsulation IPs on the same node. In another scenario, cloud gaming workloads utilize dedicated NICs for different functions: one for storage access and another for streaming GPU-generated frames to end-user devices. These are used in addition to the main interface that handles the gaming control plane traffic. As a result, the storage, streaming, and primary workload traffic must each be transported over the Encapsulation IP associated with each respective NIC. In this session, we will demonstrate how multi-homed workloads can be supported on multi-NIC nodes securely using ovn-kubernetes and OVN. We will explore the fundamentals of multi-NIC support, along with the scalability challenges and solutions needed for large-scale Multi-tenant GPU cloud environments.
Turbocharge Kubernetes with offloaded OVN-Kubernetes and OVS on DPUs |TOP|
Speaker(s): Alin Serdean, NVIDIA; Amit Zala, NVIDIA; Mael Kimmerlin, NVIDIA; Vasilis Remmas, NVIDIA; Igal Tsoiref, Red Hat; Rom Freiman, Red Hat
Running Kubernetes workloads with hardware-accelerated networking is challenging, as existing CNIs either offer limited features, low performance, or impose heavy resource overhead on hosts. Similarly, Service Function Chaining (SFC) in cloud-native environments commonly relies on software-based forwarding on general-purpose CPUs, which restricts scalability and performance. We present a Kubernetes-native platform that offloads both orchestration and service execution to NVIDIA BlueField-3 (BF3) Data Processing Units (DPUs). Our approach extends Kubernetes APIs with CRDs for declarative configuration, allowing services such as OVS, OVN-Kubernetes, routing, storage, and security functions to be deployed and chained directly on the DPU. By combining the programmability of OVS with BF3 hardware acceleration and DOCA capabilities, traffic steering and service execution are fully offloaded from the host. We ran a performance evaluation on BlueField 3 cards. Our open-source platform integrates seamlessly into DevOps workflows through Kubernetes-native interfaces and Infrastructure-as-Code practices, while also integrating with enterprise-grade platforms such as OpenShift. The result is a production-ready, DPU-accelerated networking solution that combines high performance, scalability, and simplified orchestration for next-generation cloud-native and AI workloads.
To reach the organizers, email ovscon@openvswitch.org. For general discussion of the conference, please use the ovs-discuss mailing list.
