kubernetes-nmstate 简介

kubernetes-nmstate 通过 Kubernetes API 驱动的声明式节点网络配置。

随着混合云的出现,节点网络设置变得更加具有挑战性。不同的环境有不同的网络要求。 容器网络接口(CNI)标准实现了不同的解决方案,它解决了集群中 Pod 的通讯问题,包括为其设置 IP 和创建路由等。

然而,在所有这些情况下,节点必须在 Pod 被安排之前设置好网络。 在一个动态的、异质的集群中设置网络,具有动态的网络需求,这本身就是一个挑战。

在这里插入图片描述

nmstate 这个项目旨在通过 k8s CRD 的方式配置节点上的网络,它可以一定程度上简化网络配置。

官方网站:https://nmstate.io/

项目地址:https://github.com/nmstate/kubernetes-nmstate

部署环境信息

以3个kubernetes节点为例,操作系统使用ubuntu 22.04.2 LTS

root@node40:~# kubectl get nodes -o wide
NAME     STATUS   ROLES           AGE    VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION       CONTAINER-RUNTIME
node40   Ready    control-plane   149d   v1.29.3   192.168.72.40   <none>        Ubuntu 22.04.2 LTS   5.15.0-105-generic   containerd://1.7.15
node41   Ready    <none>          149d   v1.29.3   192.168.72.41   <none>        Ubuntu 22.04.2 LTS   5.15.0-76-generic    containerd://1.7.15
node42   Ready    <none>          149d   v1.29.3   192.168.72.42   <none>        Ubuntu 22.04.2 LTS   5.15.0-76-generic    containerd://1.7.15
root@node40:~# 

部署前置要求

nmstate 依赖 NetworkManager , 所以不是所有的 Linux 发行版都支持。并且 NetworkManager 的版本必须 >= 1.20

在所有ubuntu节点上安装network-manager

apt update -y
apt install -y network-manager

可通过下面的方式检查 NetworkManager 的版本:

root@node40:~# /usr/sbin/NetworkManager --version
1.36.6

在 Ubuntu 中引入了 netplan 进行网络配置。因此,要启用 NetworkManager,需要在所有节点配置renderer: NetworkManager参数:

root@node40:~# vim /etc/netplan/00-installer-config.yaml 
# This is the network config written by 'subiquity'
network:
  version: 2
  renderer: NetworkManager
......

使配置生效

netplan generate
netplan apply

kubernetes-nmstate 部署

安装参考:https://github.com/nmstate/kubernetes-nmstate/releases

首先,安装kubernetes-nmstate operator:

kubectl apply -f https://github.com/nmstate/kubernetes-nmstate/releases/download/v0.82.0/nmstate.io_nmstates.yaml
kubectl apply -f https://github.com/nmstate/kubernetes-nmstate/releases/download/v0.82.0/namespace.yaml
kubectl apply -f https://github.com/nmstate/kubernetes-nmstate/releases/download/v0.82.0/service_account.yaml
kubectl apply -f https://github.com/nmstate/kubernetes-nmstate/releases/download/v0.82.0/role.yaml
kubectl apply -f https://github.com/nmstate/kubernetes-nmstate/releases/download/v0.82.0/role_binding.yaml
kubectl apply -f https://github.com/nmstate/kubernetes-nmstate/releases/download/v0.82.0/operator.yaml

完成后,创建一个NMState CR,触发部署kubernetes-nmstate 处理程序:

cat <<EOF | kubectl create -f -
apiVersion: nmstate.io/v1
kind: NMState
metadata:
  name: nmstate
EOF

查看创建的pods

root@node40:~# kubectl -n nmstate get pods
NAME                                    READY   STATUS    RESTARTS      AGE
nmstate-cert-manager-6dc8846667-r7cvd   1/1     Running   0             23m
nmstate-handler-2t2sf                   1/1     Running   7 (13m ago)   23m
nmstate-handler-47x9g                   1/1     Running   7 (14m ago)   23m
nmstate-handler-hrhzv                   1/1     Running   0             6m25s
nmstate-metrics-7f8b8579cd-6wfzv        2/2     Running   0             23m
nmstate-operator-58dc749498-ltnf2       1/1     Running   0             23m
nmstate-webhook-6d55bff68d-czwzx        1/1     Running   0             23m

报告节点状态

Operator定期向 API 服务器报告节点网络接口的状态。这些报告可通过为每个节点创建的NodeNetworkState对象获得。

列出所有节点的NodeNetworkStates

root@node40:~# kubectl get nodenetworkstates
NAME     AGE
node40   8m50s
node41   11m
node42   10m

还可以使用短名称nns来达到相同的效果:

root@node40:~# kubectl get nns
NAME     AGE
node40   9m10s
node41   11m
node42   11m

读取特定节点的状态

通过使用-o yaml您可以获得给定节点的完整网络状态:

root@node40:~# kubectl get nns node40 -o yaml | more
apiVersion: nmstate.io/v1beta1
kind: NodeNetworkState
metadata:
  creationTimestamp: "2024-09-18T01:05:05Z"
  generation: 1
  name: node40
  ownerReferences:
  - apiVersion: v1
    kind: Node
    name: node40
    uid: 95774bad-ad3e-4256-b6a3-144b71a9780c
  resourceVersion: "5656"
  uid: c5cd9d8f-d86a-4f97-a853-86209b554b8b
status:
  currentState:
    dns-resolver:
      config:
        search: []
        server:
        - 223.5.5.5
        - 223.6.6.6
      running:
        search: []
        server:
        - 223.5.5.5
        - 223.6.6.6
    interfaces:
    - accept-all-mac-addresses: false
      bridge:
        options:
          group-addr: 01:80:C2:00:00:00
          group-forward-mask: 0
          group-fwd-mask: 0
          hash-max: 4096
          mac-ageing-time: 300
          multicast-last-member-count: 2
          multicast-last-member-interval: 100
          multicast-membership-interval: 26000
          multicast-querier: false
          multicast-querier-interval: 25500
          multicast-query-interval: 12500
          multicast-query-response-interval: 1000
          multicast-query-use-ifaddr: false
          multicast-router: auto
          multicast-snooping: true
          multicast-startup-query-count: 2
          multicast-startup-query-interval: 3124
          stp:
            enabled: false
            forward-delay: 15
            hello-time: 2
            max-age: 20
            priority: 32768
          vlan-default-pvid: 1
          vlan-protocol: 802.1q
        port:
        - name: veth117637f8
          stp-hairpin-mode: true
          stp-path-cost: 2
          stp-priority: 32
        - name: veth4381f50e
          stp-hairpin-mode: true
          stp-path-cost: 2
          stp-priority: 32
        - name: veth7b175187
          stp-hairpin-mode: true
          stp-path-cost: 2
          stp-priority: 32
        - name: veth82b4c0dd
          stp-hairpin-mode: true
          stp-path-cost: 2
          stp-priority: 32
        - name: veth8c8368c7
          stp-hairpin-mode: true
          stp-path-cost: 2
          stp-priority: 32
        - name: vethaf20332a
          stp-hairpin-mode: true
          stp-path-cost: 2
          stp-priority: 32
      ethtool:
        feature:
          highdma: true
          rx-gro: true
          rx-gro-list: false
          rx-udp-gro-forwarding: false
          tx-checksum-ip-generic: true
          tx-esp-segmentation: true
          tx-fcoe-segmentation: false
          tx-generic-segmentation: true
          tx-gre-csum-segmentation: true
          tx-gre-segmentation: true
          tx-gso-list: true
          tx-gso-partial: true
          tx-gso-robust: false
          tx-ipxip4-segmentation: true
          tx-ipxip6-segmentation: true
          tx-nocache-copy: false
          tx-scatter-gather-fraglist: true
          tx-sctp-segmentation: true
          tx-tcp-ecn-segmentation: true
          tx-tcp-mangleid-segmentation: true
          tx-tcp-segmentation: true
          tx-tcp6-segmentation: true
          tx-tunnel-remcsum-segmentation: true
          tx-udp-segmentation: true
          tx-udp_tnl-csum-segmentation: true
          tx-udp_tnl-segmentation: true
          tx-vlan-hw-insert: true
          tx-vlan-stag-hw-insert: true
      ipv4:
        address:
        - ip: 100.64.0.1
          prefix-length: 24
        enabled: true
      ipv6:
        address:
        - ip: fe80::c82e:90ff:fea3:ed6a
          prefix-length: 64
        enabled: true
      mac-address: CA:2E:90:A3:ED:6A
      max-mtu: 65535
      min-mtu: 68
      mptcp:
        address-flags: []
      mtu: 1450
      name: cni0
      state: up
      type: linux-bridge
......

正如所看到的,该对象是集群范围的(即不属于命名空间)。它的name反映了它所代表的节点的名称。

该对象的主要部分位于status.currentState中。它包含 DNS 配置、主机上观察到的接口列表及其配置以及路由。

对象的最后一个属性是lastSuccessfulUpdateTime 。它保留记录上次成功更新报告的时间戳。由于报告会定期更新,并且在节点不可访问时(例如在网络重新配置期间)不会更新,因此该值可用于评估观察到的状态是否足够新鲜。

策略配置示例

示例演示如下:

  • 准备一个3节点集群,该集群具有 kubernetes 主服务接口(IP 为 192.168.72.x 的 ens33)和一个额外的 VLAN1 网络接口ens35。
  • 我们将使用 NMState Operator CRD在附加接口上创建一个名为 br1 的桥。
  • 我们将创建一个名为br1-ens35的 Multus networkAttachmentDefinition ,与网桥br1关联
  • 我们将创建 2 个带有附加接口的 Pod,这些接口可以在附加网络 VLAN1 上看到。

整体架构看起来像这样:
在这里插入图片描述

前置条件

  • 安装nmstate
  • 节点添加一块网卡
  • 安装multus-cni插件

节点添加网卡

node41node42节点添加一块网卡

root@node41:~# ip link show | grep ens
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
7: ens35: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000

安装multus-cni插件

kubectl apply -f https://raw.githubusercontent.com/k8snetworkplumbingwg/multus-cni/master/deployments/multus-daemonset-thick.yml

查看创建的pods

root@node40:~# kubectl -n kube-system get pods | grep multus
kube-multus-ds-hmd7g             1/1     Running   0               6m43s
kube-multus-ds-p5g8d             1/1     Running   0               6m43s
kube-multus-ds-rzzwf             1/1     Running   0               6m43s
root@node40:~# 

创建nmstate策略

node41node42节点打标签

root@node40:~# kubectl label nodes node41 external-network=true
node/node41 labeled
root@node40:~# kubectl label nodes node42 external-network=true
node/node42 labeled

创建NodeNetworkConfigurationPolicy策略,该策略在node41node42节点上创建名为br1的网桥

apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: br1-ens35
spec:
  nodeSelector:
    external-network: "true"
  desiredState:
    interfaces:
      - name: br1
        description: Linux bridge with ens35 as a port
        type: linux-bridge
        state: up
        ipv4:
          dhcp: true
          enabled: true
        bridge:
          options:
            stp:
              enabled: false
          port:
            - name: ens35

应用配置

root@node40:~# kubectl apply -f nncp.yaml 
nodenetworkconfigurationpolicy.nmstate.io/br1-ens35 created

查看创建的策略

root@node40:~# kubectl get nncp
NAME        STATUS      REASON
br1-ens35   Available   SuccessfullyConfigured

查看创建的网桥

root@node41:~# ip link show | grep br1
7: ens35: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master br1 state UP mode DEFAULT group default qlen 1000
8: br1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000

创建NetworkAttachmentDefinition

root@ubuntu:~# cat multus-bridge.yaml 
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: multus-br1
spec:
  config: |
    {
      "cniVersion": "0.3.1",
      "type": "bridge",
      "bridge": "br1",
      "ipam": {
        "type": "host-local",
        "subnet": "192.168.72.0/24",
         "rangeStart": "192.168.72.240",
         "rangeEnd": "192.168.72.250"
      }
    }

查看NetworkAttachmentDefinition

root@node40:~# kubectl get net-attach-def
NAME         AGE
multus-br1   9s

演示应用程序

root@ubuntu:~# cat demo-app.yaml
---
apiVersion: v1
kind: Pod
metadata:
  name: net-pod1
  annotations:
    k8s.v1.cni.cncf.io/networks: multus-br1
spec:
  containers:
  - name: netshoot-pod
    image: nicolaka/netshoot
    imagePullPolicy: IfNotPresent
    command: ["tail"]
    args: ["-f", "/dev/null"]
  terminationGracePeriodSeconds: 0
---
apiVersion: v1
kind: Pod
metadata:
  name: net-pod2
  annotations:
    k8s.v1.cni.cncf.io/networks: multus-br1
spec:
  containers:
  - name: netshoot-pod
    image: nicolaka/netshoot
    imagePullPolicy: IfNotPresent
    command: ["tail"]
    args: ["-f", "/dev/null"]
  terminationGracePeriodSeconds: 0

应用配置

kubectl apply -f demo-app.yaml

查看创建的两个pod

root@node40:~# kubectl get pods
NAME       READY   STATUS    RESTARTS   AGE
net-pod1   1/1     Running   0          44m
net-pod2   1/1     Running   0          44m

查看net-pod1网卡

root@node40:~# kubectl exec -it net-pod1 -- ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0@if14: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default 
    link/ether 7e:d4:a4:04:a3:58 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 100.64.1.5/24 brd 100.64.1.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::7cd4:a4ff:fe04:a358/64 scope link 
       valid_lft forever preferred_lft forever
3: net1@if15: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether a2:7a:47:1b:59:04 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.72.243/24 brd 192.168.72.255 scope global net1
       valid_lft forever preferred_lft forever
    inet6 fe80::a07a:47ff:fe1b:5904/64 scope link 
       valid_lft forever preferred_lft forever

查看net-pod2网卡

root@node40:~#  kubectl exec -it net-pod2 -- ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0@if11: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default 
    link/ether 22:cc:41:f9:6b:ad brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 100.64.2.5/24 brd 100.64.2.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::20cc:41ff:fef9:6bad/64 scope link 
       valid_lft forever preferred_lft forever
3: net1@if12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 1e:07:5b:d3:0e:77 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.72.241/24 brd 192.168.72.255 scope global net1
       valid_lft forever preferred_lft forever
    inet6 fe80::1c07:5bff:fed3:e77/64 scope link 
       valid_lft forever preferred_lft forever

测试PING自身IP

root@node40:~# kubectl exec -it net-pod1 -- ping -c 3 -I net1 192.168.72.243
PING 192.168.72.243 (192.168.72.243) from 192.168.72.243 net1: 56(84) bytes of data.
64 bytes from 192.168.72.243: icmp_seq=1 ttl=64 time=0.025 ms
64 bytes from 192.168.72.243: icmp_seq=2 ttl=64 time=0.060 ms
64 bytes from 192.168.72.243: icmp_seq=3 ttl=64 time=0.054 ms

--- 192.168.72.243 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2056ms
rtt min/avg/max/mdev = 0.025/0.046/0.060/0.015 ms
root@node40:~# 

测试PING net-pod2 IP

root@node40:~# kubectl exec -it net-pod1 -- ping -c 3 -I net1 192.168.72.241
PING 192.168.72.241 (192.168.72.241) from 192.168.72.243 net1: 56(84) bytes of data.
64 bytes from 192.168.72.241: icmp_seq=1 ttl=64 time=0.240 ms
64 bytes from 192.168.72.241: icmp_seq=2 ttl=64 time=0.412 ms
64 bytes from 192.168.72.241: icmp_seq=3 ttl=64 time=0.627 ms

--- 192.168.72.241 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2037ms
rtt min/avg/max/mdev = 0.240/0.426/0.627/0.158 ms

测试PING主机IP

root@node40:~# kubectl exec -it net-pod1 -- ping -c 3 -I net1 192.168.72.40
PING 192.168.72.40 (192.168.72.40) from 192.168.72.243 net1: 56(84) bytes of data.
64 bytes from 192.168.72.40: icmp_seq=1 ttl=64 time=0.626 ms
64 bytes from 192.168.72.40: icmp_seq=2 ttl=64 time=0.348 ms
64 bytes from 192.168.72.40: icmp_seq=3 ttl=64 time=0.451 ms

--- 192.168.72.40 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2041ms
rtt min/avg/max/mdev = 0.348/0.475/0.626/0.114 ms
root@node40:~# 

最终,我们为两个pod附件了net1网卡,并通过br1网桥连接到主机节点网卡上。

最重要的是我们并不需要手动在主机上创建br1网桥,而是使用kubernetes-nmstate基于kubernetes API自动操作的,同样,可以基于此类方法,在主机上自动创建bond网卡,划分VLAN子接口然后分配给pod等。

Logo

华为开发者空间,是为全球开发者打造的专属开发空间,汇聚了华为优质开发资源及工具,致力于让每一位开发者拥有一台云主机,基于华为根生态开发、创新。

更多推荐