xmit_hash_policy bond参数中不同策略的值计算方法

包含分段和未分段数据包的单个 TCP 或 UDP 会话可能会看到两个接口之间的流量平衡，这可能导致无序交付。预期的用例是由多个虚拟机共享的绑定，所有虚拟机都配置为使用自己的 vlan，以提供类似 lacp 的功能，而无需具有 lacp 功能的交换硬件。对于主要使用单个大的四层流的流量，例如单个NFS挂载、单个iSCSI目标/启动器，或其他持久的单个TCP/UDP连接，该流量不能负载均衡。如果此系统

QTM_Gitee

6774人浏览 · 2022-04-25 19:22:52

QTM_Gitee · 2022-04-25 19:22:52 发布

原文地址：https://access.redhat.com/solutions/71883

环境

Red Hat Enterprise Linux
在模式 2 (balance-xor) 或模式 4 (802.3ad aka LACP) 或模式 5 (balance-tlb) 或模式 6 (balance-alb) 中提供链路聚合的绑定驱动程序

问题

xmit_hash_policy bond参数中不同策略的值是如何计算的？
我们需要了解负载均衡衡算法背后的逻辑/数学的实际实现。
layer2、layer2+3、layer 3+4、encap 2+3、encap 3+4 和 vlan+src mac 这三个策略中的每一个如何使用算法？
使用什么公式来计算网络绑定哈希策略？
网络绑定中有哪些不同的哈希策略以及如何配置它？

决议

配置

xmit_hash_policy 负载均衡参数可以与 mode=2、mode=4、mode=5 和 mode=6 一起使用。但是，mode=5 和 mode=6 只有在设置了 tlb_dynamic_lb=0 时才会应用。

例如，假设我们必须将bondX配置为mode=2 balance xor并启用xmit_hash_policy=layer2+3：

### 如果使用network service，我们可以将 ifcfg-bondX 中的 BONDING_OPTS 修改为：
BONDING_OPTS="miimon=100 mode=2 xmit_hash_policy=layer2+3"

### 如果使用 NetworkManager，我们可以使用：
# nmcli con modify bond.options "miimon=100,mode=2,xmit_hash_policy=layer2+3"

绑定设备的完整配置在以下位置讨论：

How do I configure a bonding device on Red Hat Enterprise Linux (RHEL)?

layer2

“layer2”策略使用源和目标MAC地址以及以太网协议类型的异或。

计算如下：

  hash = source MAC XOR destination MAC XOR packet type ID
  slave number = hash modulo slave count

该算法会将到特定网络对等方的所有流量放在同一slave上。

如果网络流量在这个系统和多个其他系统之间在同一个广播域中，这是一个很好的算法。

如果此系统与多个其它系统之间的网络流量通过默认网关，则应该考虑另一种算法。

该算法兼容802.3ad。

如果没有提供配置，这是默认策略。

layer2+3

layer2+3 策略使用源和目标 MAC 地址和 IP 地址的异或。

计算如下：

  hash = source MAC XOR destination MAC XOR packet type ID
  hash = hash XOR source IP XOR destination IP
  hash = hash XOR (hash RSHIFT 16)
  hash = hash XOR (hash RSHIFT 8)
  hash = hash RSHIFT 1
  slave number = hash modulo slave count

这个算法将到一个特定的IP地址所有的流量放在同一个slave上。

如果此系统与多个其他系统之间的网络流量通过默认网关，这是一个很好的算法。

如果网络流量主要在该系统和另一个系统之间，则应考虑另一种算法。

对于非 IP 流量，公式与“layer2”传输策略相同。

该算法兼容802.3ad。

layer3+4

layer3+4 策略使用源端口和目标端口以及 IP 地址的异或。

计算如下：

  hash = source port , destination port (如标题)
  hash = hash XOR source IP XOR destination IP
  hash = hash XOR (hash RSHIFT 16)
  hash = hash XOR (hash RSHIFT 8)
  hash = hash RSHIFT 1
  slave number = hash modulo slave count

如果此系统和另一个系统之间的网络流量使用相同的 IP 但多个端口，这个算法是一个不错的选择。

对于非 IP 流量，公式与“layer2”传输策略相同。

此算法不兼容 802.3ad 。

对于分段的 TCP 或 UDP 数据包和所有其它IP 协议流量，将省略源和目标端口信息。该策略旨在模仿某些交换机的行为，尤其是带有 PFC2 的 Cisco 交换机以及一些 Foundry 和 IBM 产品。

包含分段和未分段数据包的单个 TCP 或 UDP 会话可能会看到两个接口之间的流量平衡，这可能导致无序交付。大多数流量类型将不符合此标准，因为 TCP 很少对流量进行分段，并且大多数 UDP 流量不涉及扩展对话。 802.3ad 的其他实现可能会或可能不会容忍这种不合规性。

encap2+3

此策略使用与 layer2+3 相同的公式，但它依赖于 skb_flow_dissect 来获取头字段，如果使用封装协议，这可能会导致使用内部头。

这将提高隧道用户的性能，因为数据包将根据封装的流进行分发。

encap3+4

该策略使用与 layer3+4 相同的公式，但它依赖于 skb_flow_dissect 来获取头字段，如果使用封装协议，这可能导致使用内部头。

这将提高隧道用户的性能，因为数据包将根据封装的流进行分发。

vlan+srcmac

vlan+srcmac 策略使用 vlan ID 和源 MAC vendor 和源 MAC dev 的异或。

计算如下：

  hash = (vlan ID) XOR (source MAC vendor) XOR (source MAC dev)
  slave number = hash modulo slave count

此策略使用非常基本的 vlan ID 和源 mac 哈希来对每个 vlan 的流量进行负载平衡，并在一条支路发生故障时进行故障转移。

预期的用例是由多个虚拟机共享的绑定，所有虚拟机都配置为使用自己的 vlan，以提供类似 lacp 的功能，而无需具有 lacp 功能的交换硬件。

此功能从 RHEL 8.4 或 kernel-4.18.0-305.el8 开始提供。

Single Stream

对于主要使用单个大的四层流的流量，例如单个NFS挂载、单个iSCSI目标/启动器，或其他持久的单个TCP/UDP连接，该流量不能负载均衡。

如果需要单个持久流更快，则必须使用更快的网络接口和网络基础设施。

诊断步骤

处理哈希策略的相关代码是：

5.14.0-63.el9/drivers/net/bonding/bond_main.c

Following xmit policies are available:
#define BOND_XMIT_POLICY_LAYER2         0 /* layer 2 (MAC only), default */
#define BOND_XMIT_POLICY_LAYER34        1 /* layer 3+4 (IP ^ (TCP || UDP)) */
#define BOND_XMIT_POLICY_LAYER23        2 /* layer 2+3 (IP ^ MAC) */
#define BOND_XMIT_POLICY_ENCAP23        3 /* encapsulated layer 2+3 */
#define BOND_XMIT_POLICY_ENCAP34        4 /* encapsulated layer 3+4 */
#define BOND_XMIT_POLICY_VLAN_SRCMAC    5 /* vlan + source MAC */

/**
 * bond_xmit_hash - generate a hash value based on the xmit policy
 * @bond: bonding device
 * @skb: buffer to use for headers
 *
 * This function will extract the necessary headers from the skb buffer and use
 * them to generate a hash based on the xmit_policy set in the bonding device
 */
u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
{
        if (bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP34 &&
            skb->l4_hash)
                return skb->hash;

        return __bond_xmit_hash(bond, skb, skb->data, skb->protocol,
                                skb_mac_offset(skb), skb_network_offset(skb),
                                skb_headlen(skb));
}

/* Generate hash based on xmit policy. If @skb is given it is used to linearize
 * the data as required, but this function can be used without it if the data is
 * known to be linear (e.g. with xdp_buff).
 */
static u32 __bond_xmit_hash(struct bonding *bond, struct sk_buff *skb, const void *data,
                            __be16 l2_proto, int mhoff, int nhoff, int hlen)
{
        struct flow_keys flow;
        u32 hash;

        if (bond->params.xmit_policy == BOND_XMIT_POLICY_VLAN_SRCMAC)
                return bond_vlan_srcmac_hash(skb, data, mhoff, hlen);

        /* If the xmit_policy is set to BOND_XMIT_POLICY_LAYER2 or when packets 
           types are not identified (ie when bond_flow_dissect returns false),
           BOND_XMIT_POLICY_LAYER2 will be used 
         */

        if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER2 ||
            !bond_flow_dissect(bond, skb, data, l2_proto, nhoff, hlen, &flow))
                return bond_eth_hash(skb, data, mhoff, hlen);

        if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER23 ||
            bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP23) {
                hash = bond_eth_hash(skb, data, mhoff, hlen);
        } else {
                if (flow.icmp.id)
                        memcpy(&hash, &flow.icmp, sizeof(hash));
                else
                        memcpy(&hash, &flow.ports.ports, sizeof(hash));
        }

        return bond_ip_hash(hash, &flow);
}

/* L2 hash helper */
static inline u32 bond_eth_hash(struct sk_buff *skb, const void *data, int mhoff, int hlen)
{
        struct ethhdr *ep;

        data = bond_pull_data(skb, data, hlen, mhoff + sizeof(struct ethhdr));
        if (!data)
                return 0;

        ep = (struct ethhdr *)(data + mhoff);
        return ep->h_dest[5] ^ ep->h_source[5] ^ ep->h_proto;
}

static u32 bond_ip_hash(u32 hash, struct flow_keys *flow)
{
        hash ^= (__force u32)flow_get_u32_dst(flow) ^
                (__force u32)flow_get_u32_src(flow);
        hash ^= (hash >> 16);
        hash ^= (hash >> 8);
        /* discard lowest hash bit to deal with the common even ports pattern */
        return hash >> 1;
}

static u32 bond_vlan_srcmac_hash(struct sk_buff *skb, const void *data, int mhoff, int hlen)
{
        struct ethhdr *mac_hdr;
        u32 srcmac_vendor = 0, srcmac_dev = 0;
        u16 vlan;
        int i;

        data = bond_pull_data(skb, data, hlen, mhoff + sizeof(struct ethhdr));
        if (!data)
                return 0;
        mac_hdr = (struct ethhdr *)(data + mhoff);

        for (i = 0; i < 3; i++)
                srcmac_vendor = (srcmac_vendor << 8) | mac_hdr->h_source[i];

        for (i = 3; i < ETH_ALEN; i++)
                srcmac_dev = (srcmac_dev << 8) | mac_hdr->h_source[i];

        if (!skb_vlan_tag_present(skb))
                return srcmac_vendor ^ srcmac_dev;

        vlan = skb_vlan_tag_get(skb);

        return vlan ^ srcmac_vendor ^ srcmac_dev;
}

此处，我们使用流来查找实际的数据包头信息，例如 ip 和端口详细信息。

BOND_XMIT_POLICY_ENCAP23 和 BOND_XMIT_POLICY_ENCAP34 像普通的 layer23 或 layer34 xmit 策略一样工作，但有助于解析封装的数据包并从中读取 IP 和网络标头以进行散列。

下面是根据bonding模式选择，选择数据发送接口的HASH计算:

假定拓扑
----------------
服务器

    bond0
    MAC: 00:1b:21:74:b6:39
    IP : 169.254.92.64 = 0xA9FE5C40
    UDP: 12243         = 0x2FD3
    packet ID:         = 0x0800   (考虑 IPv4)

    NIC_Count = 2
    NIC0 assigned # value: 0
    NIC1 assigned # value: 1

目标

    Client1
        MAC: 00:1a:22:12:34:59
        IP : 192.168.1.11  = 0xC0A8010A
        UDP: 42424         = 0xA5B8

    Client2
        MAC: 00:1e:c1:07:45:1A
        IP : 192.168.100.24 = 0xC0A86418
        UDP: dst port 42424 = 0xA5B8


模式行为
--------------
1. layer2:

        Hash = ( SRC_MAC[5] ^ DST_MAC[5] ^ packet ID ) % NIC_Count

        Server --> Client1
        Hash = ((0x0039 ^ 0x0059) ^ 0x0800) % 2 = 0 ---> 通过 NIC0 发送数据包

        Server --> Client2
        Hash = ((0x0039 ^ 0x001A) ^ 0x0800)  % 2 = 1 ---> 通过 NIC1 发送数据包

2. layer2+3:

              hash = source MAC XOR destination MAC XOR packet type ID
              hash = hash XOR source IP XOR destination IP
              hash = hash XOR (hash RSHIFT 16)
              hash = hash XOR (hash RSHIFT 8)
              hash  = hash RSHIFT 1
              slave number = hash modulo slave count

              Server --> Client1

              hash = (0x0039 ^ 0x0059) ^ 0x0800) = 0x0860 
              hash =  0x0860  ^ ( 0xA9FE5C40 ^ 0xC0A8010A ) ) = 0x6956552A
              hash =  0x6956552A ^ (0x6956552A >> 16) = 0x69563C7C
              hash =  0x69563C7C ^ (0x69563C7C >> 8)  = 0x693F6A40
              hash =  0x693F6A40 >> 1 = 0x349FB520
              slave number = 0x349FB520 % 2 = 0 ---> 通过 NIC0 发送数据包

              Server --> Client2

              hash = (0x0039 ^ 0x001A) ^ 0x0800) = 0x0835 
              hash =  0x0835  ^ ( 0xA9FE5C40 ^ 0xC0A86418 )  = 0x6956306D
              hash =  0x6956306D ^ (0x6956306D >> 16) = 0x6956593B
              hash =  0x6956593B ^ (0x6956593B >> 8)  = 0x693F0F62
              hash =  0x693F0F62 >> 1 = 0x349F87B1
              slave number = 0x349F87B1 % 2 = 1 ---> 通过 NIC1 发送数据包

3. layer3+4:

              hash = source port , destination port (as in the header)
              hash = hash XOR source IP XOR destination IP
              hash = hash XOR (hash RSHIFT 16)
              hash = hash XOR (hash RSHIFT 8)
              hash  = hash RSHIFT 1

              Server --> Client1

              hash = (0x2FD3 , 0xA5B8) = 0x2FD3A5B8
              hash = 0x2FD3A5B8^ ( 0xA9FE5C40 ^ 0xC0A8010A ) = 0x4685F8F2
              hash = 0x4685F8F2 ^ (0x4685F8F2 >> 16) = 0x4685BE77
              hash = 0x4685BE77 ^ (0x4685BE77 >> 8)  = 0x46C33BC9
              hash = 0x46C33BC9 >> 1 = 0x23619DE4
              slave number = 0x23619DE4 % 2 = 0 ---> 通过 NIC0 发送数据包

              Server --> Client2

              hash = (0x2FD3 , 0xA5B8) = 0x2FD3A5B8 
              hash = 0x2FD3A5B8 ^ ( 0xA9FE5C40 ^ 0xC0A86418 ) = 0x46859DE0
              hash = 0x46859DE0 ^ (0x46859DE0 >> 16) = 0x4685DB65
              hash = 0x4685DB65 ^ (0x4685DB65 >> 8)  = 0x46C35EBE
              hash = 0x46C35EBE >> 1 = 0x2361AF5F
              slave number = 0x2361AF5F % 2 = 1 ---> 通过 NIC1 发送数据包  


4. vlan+srcmac

    Consider bond has VLAN interface with VLAN ID 100 and 101

              hash = (vlan ID) XOR (source MAC vendor) XOR (source MAC dev)

              Server wth VLAN 100(0x64) --> Client1

              hash = 0x64 ^ 0x001B21 ^ 0x74B639 = 0x74AD7C
              slave number = 0x74AD7C % 2 = 0 ---> 通过 NIC0 发送数据包

              Server wth VLAN 101(0x65) --> Client2

              hash = 0x65 ^ 0x001B21 ^ 0x74B639 = 0x74AD7D
              slave number = 0x74AD7D % 2 = 1 --> 通过 NIC1 发送数据包

华为开发者空间

华为开发者空间，是为全球开发者打造的专属开发空间，汇聚了华为优质开发资源及工具，致力于让每一位开发者拥有一台云主机，基于华为根生态开发、创新。

更多推荐

华为开发者空间云开发环境（容器）操作指导

华为开发者空间

【openGauss】Oracle与openGauss/GaussDB数据一致性高效核对方案

华为开发者空间

【GaussDB】在逻辑复制中剔除指定用户的事务

基于逻辑复制标签实现过滤，技术上可行，但打标签这个附加操作需要在执行sql前执行（除非使用触发器，但触发器属于高风险操作，不建议使用），如果漏执行，将会存在错误覆盖目标库的风险。历史表归档方案通过在源库建立历史表存储归档数据，配置复制规则排除历史表的删除操作，虽然会增加IO开销，但实现简单、安全性高，避免了事务过滤可能带来的风险。虽然插入历史表会产生额外IO，可能使数据归档操作时间翻倍，但相比剔除