Optimizing DPDK Performance on NUMA Systems

Do you want to squeeze every last drop of performance out of those NUMA systems?

To begin with: what the ***** DPDK is and why it matters on NUMA systems. DPDK stands for Data Plane Development Kit, which is a set of libraries that allow you to bypass the kernel network stack and directly access your hardware. This can result in some serious performance gains, especially when dealing with high-speed networking workloads.

But here’s where things get interesting: NUMA systems have multiple memory controllers (MC) and CPU sockets, which means that data has to travel further between them than it would on a single socket system. This can lead to increased latency and decreased performance if not properly optimized.

So how do we optimize DPDK for NUMA systems? Well, let’s start with the basics: pinning your processes to specific CPU cores. This ensures that your workload is always running on the same set of CPUs and minimizes cache misses. Here’s an example command:

#!/bin/bash

# This script is used to optimize DPDK for NUMA systems by pinning processes to specific CPU cores.

# First, we use the "numactl" command to specify the memory binding to NUMA nodes 0-31.
# This ensures that the processes will have access to the memory on these nodes.
# We also specify the CPU node binding to all even-numbered nodes from 2-30.
# This ensures that the processes will be pinned to these specific CPU cores.
# The "--membind" and "--cpunodebind" options are used to specify these bindings.
# The "all" keyword in "--cpunodebind" means that all processes will be pinned to these nodes.
# The "numactl-dpdk" command is the actual DPDK application that will be executed.

numactl --membind=0-31 --cpunodebind=all,2-30 numactl-dpdk

This command pins your DPDK workload to CPU sockets 0 and 1 (numa node 0) and disables all other CPUs. This ensures that your data stays within the same NUMA node as much as possible, reducing latency and improving performance.

We can also optimize our memory usage by setting up a “NUMA-aware” allocator for DPDK. This allows us to allocate memory that is physically close to the CPU cores we’re using, further reducing latency and improving performance. Here’s an example command:

# Set up a "NUMA-aware" allocator for DPDK, allowing us to allocate memory that is physically close to the CPU cores we're using.
# This reduces latency and improves performance.
# Here's an example command:

# Use the "numactl" command to specify memory binding and CPU node binding for DPDK.
# "--membind" option specifies the memory nodes to bind to, in this case, nodes 0-31.
# "--cpunodebind" option specifies the CPU nodes to bind to, in this case, all nodes and nodes 2-30.
# "all" keyword specifies all available CPU nodes.
# This ensures that memory and CPU cores are physically close to each other, reducing latency and improving performance.
numactl --membind=0-31 --cpunodebind=all,2-30 numactl-dpdk

This command sets up a “NUMA-aware” allocator for DPDK and pins our workload to CPU sockets 0 and 1 (numa node 0) as before. This ensures that we’re using memory that is physically close to the CPUs we’re using, further reducing latency and improving performance.

With these simple tips, you can optimize DPDK for NUMA systems and squeeze every last drop of performance out of your hardware.

SICORPS