Network kernel protocol berkeley

From a macro perspective, we compared the eBPF with the kernel to help you understand the eBPF principle at a coarse granularity.

Micro Perspective ​

The software architecture has the following features:

Main Functions ​

FeatureFirst Available InFunction DescriptionApplication Scenarios
Tc-bpf4.1eBPF reconstructs kernel traffic classification.Networking
XDP4.8Network data plane programming technology (for L2/L3 services)Networking
Cgroup socket4.10The socket in the cgroup allows the eBPF logic to be extended.Container
AF_XDP4.18Original network packets are directly sent to the user mode (similar to DPDK).Networking
Sockmap4.20Sockmap supports short circuit processing.Container
Device JIT4.20JIT/ISA decoupling. The host can compile ISA instructions of a specified device form.Heterogeneous programming
Cgroup sysctl5.2The system invoking permission can be controlled in a cgroup.Container
Struct ops Prog ext5.3The kernel logic and eBPF Prog can be dynamically replaced.Framework basics
Bpf trampoline5.51. Replaces K(ret)probe in the kernel for better performance.
2. Used in eBPF Prog to solve the eBPF Prog debugging problem.
3. Implements the eBPF Prog dynamic link function (future function).
Performance tracing
KRSI (lsm + eBPF)5.7Customizable security policies during kernel runningSecurity
Ring buffer5.8A ring buffer is shared between CPUs to provide cross-CPU event order-preserving recording. It is used to replace buffers such as perf and ftrace.Tracing/Performance analysis

Note: The BPF community is still developing rapidly. For details about more advanced features, see the kernel community.

Application Scenarios ​

Networking ​

In network acceleration scenarios, DPDK used to be the only choice in some scenarios. With the development of the kernel eBPF community, the emergence of XDP provides a new choice for vendors. The following lists their differences:

Polycube ​

VNF scenario example:

Container ​

In cloud native scenarios, containers have advantages such as low overhead, light weight, and easy management compared with virtualization technologies. Containers have become the de facto standard for cloud native applications. The network requirements come from actually the applications, that is, application-oriented network services.

Cilium ​

Cilium is a pure open source software used to transparently protect network connections between applications deployed using Linux container management platforms (such as Docker and Kubernetes).

Cilium uses eBPF as its technical basis to provide a high-performance, flexible, and secure container network solution. The example functions are:

Function 1: Use Kubernetes labels instead of IP addresses/ports for container micro-isolation.

Function 2: Use sockmap instead of loopback for communication to accelerate sidecar.

Cloud Native O&M ​

There are various maintenance and debugging methods in the kernel, but they are provided from the perspective of the kernel and cannot meet the maintenance and debugging requirements in the container scenario. The solution to this is the eBPF technology, which collects data from the microservice perspective and implements the O&M function for the container platform. Mature projects in the industry are sysdig and hubble.

Custom Kernel Logics (Customized TCP Congestion-Control Algorithm) ​

For details, see https://lwn.net/Articles/811631/

BPF can redefine the struct xxx_ops structure in the kernel. Currently, BPF 5.6 supports the customization of the TCP congestion-control algorithm.

  1. Use C or Rust to define the TCP congestion-control algorithm. Refer to bpf_cubic.c for instructions.
  2. Use clang to compile the algorithm into an elf file.
  3. Use bpftool to load the elf file. (Refer to struct_ops.c for instructions.)

~ bpftool struct_ops register \ 

Value: A large number of custom TCP congestion-control algorithms are required in CDN scenarios. For example, BPF can provide custom access policies for file ops.

"The Linux kernel continues its march towards becoming a BPF runtime-powered microkernel" -- Toke Høiland-Jørgensen

Security ​

The running security of the Linux system is always in dynamic balance. The system security needs to be evaluated from two aspects: signals (indicating abnormal activities in the system) and mitigation (remedial measures for signals). Configuring signals and mitigations scattered in the kernel is time- and labor-consuming. The solution to this is eBPF. It introduces a set of eBPF helpers that provide a unified policy API for signals and mitigations.

Development Trends and Motivations Behind ​

Application Scenarios ​

Development Trends ​

Motivations ​

Summary ​

This article can be concluded in two sentences.

"BPF is eating the world."-- Marek Majkowski "Let's change the world!"-- openEuler and all Geeks

Implementation of openEuler eBPF ​

Linux kernel 4.19 of openEuler LTS version and Linux kernel 5.4 of openEuler innovative version.

In addition to inheriting the eBPF and backporting bug fixes of the upstream community, openEuler is committed to building an open, high-performance data foundation based on the eBPF technology to provide downstream vendors with more convenient service innovation methods.

[Copyright] Copyright © 2024 openEuler Community. This article is first released by the openEuler community. Please reproduce it in compliance with the CC-BY-SA 4.0 license. Please note the text and keep the original link and author information when reproducing the article.

[Disclaimer] This article only represents the author's opinions, and is irrelevant to this website. This website is neutral in terms of the statements and opinions in this article, and does not provide any express or implied warranty of accuracy, reliability, or completeness of the contents contained therein. This article is for readers' reference only, and all legal responsibilities arising therefrom are borne by the reader himself.