一、业务背景与需求描述

1.1 业务场景

系统需要查询每日行情数据,针对每个交易品种(asset_id),获取两个时间段的最新行情快照:

  • 全日快照:在 update_time <= '15:00:00' 范围内,取 seq_no 最大的记录(当日收盘前最新数据)
  • 早盘快照:在 update_time <= '09:00:00' 范围内,取 seq_no 最大的记录(开盘前最新数据)

需要返回的字段包括交易日期、品种ID、成交额、成交量、持仓量等。

1.2 数据规模

  • data_ops.t_market_data_detail(行情深度明细表)
  • 全表数据量:约 15,198,348 行
  • 过滤后数据量:约 15,189,573 行(特定交易日 + 时间范围 + 指定频道)
  • 最终输出行数:约 2,355 行(按品种分组后的结果)

1.3 查询条件

trade_date = '20250627' update_time <= '15:00:00' channel_id IN (100, 500, 1000, 5000)

1.4 原始SQL执行耗时对比

ORACLE 19.10 GaussDB 506.0 SPC0500 集中式
30s 123s

二、初始 SQL 分析(方案 0)

2.1 原始 SQL

SELECT MIN(trade_date) KEEP(dense_rank FIRST ORDER BY seq_no DESC NULLS LAST) AS trade_date, MIN(asset_id) KEEP(dense_rank FIRST ORDER BY seq_no DESC NULLS LAST) AS asset_id, MIN(amount) KEEP(dense_rank FIRST ORDER BY seq_no DESC NULLS LAST) AS amount, MIN(qty) KEEP(dense_rank FIRST ORDER BY seq_no DESC NULLS LAST) AS qty, MIN(oi_value) KEEP(dense_rank FIRST ORDER BY seq_no DESC NULLS LAST) AS oi_value, MIN(qty) KEEP(dense_rank FIRST ORDER BY t2_seq_no DESC NULLS LAST) AS t2_qty, MIN(amount) KEEP(dense_rank FIRST ORDER BY t2_seq_no DESC NULLS LAST) AS t2_amount FROM (SELECT seq_no, CASE WHEN update_time > '09:00:00' THEN NULL ELSE seq_no END AS t2_seq_no, trade_date, asset_id, amount, qty, oi_value FROM data_ops.t_market_data_detail s WHERE trade_date = '20250627' AND update_time <= '15:00:00' AND (channel_id = 1000 OR channel_id = 5000 OR channel_id = 100 OR channel_id = 500)) GROUP BY asset_id;

2.2 完整执行计划

id | operation | A-time | A-rows | E-rows | Peak Memory | A-width | E-width | E-costs ----+-----------------------------------------------+------------+--------------+--------------+-------------+---------+---------+------------------------ 1 | -> GroupAggregate | 123462.280 | 2355 | 2354 | 71kB | | 273 | 3241514.039..3659250.836 2 | -> Sort | 68854.676 | 15189573 | 15189573 | 1048966kB | | 49 | 3241514.039..3279487.971 3 | -> Seq Scan on t_market_data_detail s | 16408.833 | 15189573 | 15189573 | 53kB | | 49 | 0.000..730253.225 (3 rows) Predicate Information (identified by plan id) --------------------------------------------------------------------------------------------------------------------------- 3 --Seq Scan on t_market_data_detail s Filter: (((update_time)::text <= '15:00:00'::text) AND ((trade_date)::text = '20250627'::text) AND ((channel_id = 100) OR (channel_id = 500) OR (channel_id = 1000) OR (channel_id = 5000))), (Expression Flatten optimized), (Predicates Reordered) Rows Removed by Filter: 8775 (3 rows) Memory Information (identified by plan id) --------------------------------------------------------------------------------------------------------------------------- 2 --Sort Sort Method: external merge Disk: 938544kB (2 rows) ================ Query Summary ================ ----------------------------------------------- Datanode executor start time: 0.179 ms Datanode executor run time: 123468.607 ms Datanode executor end time: 74.734 ms Planner runtime: 0.873 ms Query Id: 1946962414165264683 Total runtime: 123543.548 ms (6 rows)

总耗时:123,543 ms(约 2 分钟)

执行计划摘要
节点 操作 实际耗时 行数
1 GroupAggregate 123,462 ms 2,355
2 Sort 68,854 ms 15,189,573
3 Seq Scan 16,408 ms 15,189,573

2.3 核心问题诊断

  1. 全表顺序扫描:无任何索引可用,对近 1500 万行数据进行全表扫描
  2. 单次大排序:Sort 节点耗时 68.8s,使用外部归并排序,磁盘 I/O 达 938 MB
  3. KEEP 聚合函数开销:每个 KEEP 子句都需要在分组时维护有序状态,7 个 KEEP 表达式叠加计算成本极高
  4. 串行执行:未启用并行查询,单线程处理全部数据

三、优化方案演进

方案 1:启用并行查询(query_dop = 4)

3.1 优化措施

SET query_dop = 4; -- SQL 逻辑不变

3.2 完整执行计划

id | operation | A-time | A-rows | E-rows | Peak Memory | A-width | E-width | E-costs ----+------------------------------------------------+-------------------------------+--------------+--------------+-----------------+-----------+---------+------------------------ 1 | -> Streaming(type: LOCAL GATHER dop: 1/4) | [43143.728,43143.728] | 2355 | 2354 | [136kB,136kB] | | 273 | 2837486.031..3255350.306 2 | -> GroupAggregate | [42060.014,43116.415] | 2355 | 2354 | [63kB,63kB] | | 273 | 2837486.031..3255222.828 3 | -> Sort | [28888.504,29384.295] | 15189573 | 15189573 | [261MB,263MB] | [68,68] | 49 | 2837486.031..2875459.963 4 | -> Streaming(type: LOCAL REDISTRIBUTE dop: 4/4) | [18407.666,18551.991] | 15189573 | 15189573 | [134kB,134kB] | | 49 | 0.000..850777.167 5 | -> Seq Scan on t_market_data_detail s | [4196.041,4228.979] | 15189573 | 15189573 | [46kB,46kB] | | 49 | 0.000..183563.306 (5 rows) Predicate Information (identified by plan id) --------------------------------------------------------------------------------------------------------------------------- 5 --Seq Scan on t_market_data_detail s Filter: (((update_time)::text <= '15:00:00'::text) AND ((trade_date)::text = '20250627'::text) AND ((channel_id = 100) OR (channel_id = 500) OR (channel_id = 1000) OR (channel_id = 5000))), (Expression Flatten Optimized), (Predicates Reordered) Rows Removed by Filter: 8775 (3 rows) Memory Information (identified by plan id) --------------------------------------------------------------------------------------------------------------------------- Sort Method: external merge Memory: 1048576kB ~ 0kB 3 --Sort Sort Method: external merge Disk: 231288kB ~ 231288kB (3 rows) User Define Profiling --------------------------------------------------------------------------------------------------------------------------- Segment Id: 1 Track name: Datanode build connection (actual time=[0.000, 0.000], calls=[1, 1]) Plan Node id: 1 Track name: Datanode start up stream thread (actual time=[0.347, 0.347], calls=[1, 1])) (4 rows) ================ Query Summary ================ ----------------------------------------------- Datanode executor start time: 1.041 ms Datanode executor run time: 43144.837 ms Datanode executor end time: 0.065 ms Planner runtime: 34.099 ms Query Id: 1946962414165247362 Total runtime: 43145.962 ms (6 rows)

总耗时:43,145 ms(约 43 秒)

执行计划摘要
节点 操作 实际耗时 说明
1 Streaming (LOCAL GATHER) 43,143 ms 并行结果汇聚
2 GroupAggregate 42,060~43,116 ms 4 个线程并行聚合
3 Sort 28,888~29,384 ms 每个线程排序约 1/4 数据
4 Streaming (LOCAL REDISTRIBUTE) 18,407~18,551 ms 数据重分布
5 Seq Scan 4,196~4,228 ms 并行扫描,每个线程约 4s
3.3 效果分析
  • 性能提升:65%(从 123s → 43s)
  • 磁盘排序量从 938MB 降至各线程约 231MB
  • 扫描时间从 16.4s 降至约 4.2s(接近理想的 4 倍加速比)
  • 局限性:仍然需要全表扫描 + 大排序,KEEP 聚合函数的本质开销未解决

方案 2:DISTINCT ON + 覆盖索引重构

3.4 优化思路

将 KEEP 聚合改写为 PostgreSQL 特有的 DISTINCT ON 语法,同时创建覆盖索引避免回表。

3.5 DDL 变更

-- 创建测试表(数据副本) CREATE TABLE t_test_staging AS SELECT * FROM data_ops.t_market_data_detail; -- 创建覆盖索引:排序键 + 包含所有查询字段 CREATE INDEX idx_opt_001 ON t_test_staging (asset_id, seq_no DESC) INCLUDE (trade_date, amount, qty, oi_value, update_time, channel_id);

3.6 SQL 改写

SELECT a.asset_id, a.trade_date, a.amount, a.qty, a.oi_value, n.t2_qty, n.t2_amount FROM (SELECT DISTINCT ON (asset_id) asset_id, trade_date, amount, qty, oi_value FROM t_test_staging WHERE trade_date = '20250627' AND update_time <= '15:00:00' AND channel_id IN (100, 500, 1000, 5000) ORDER BY asset_id, seq_no DESC NULLS LAST) a LEFT JOIN (SELECT DISTINCT ON (asset_id) asset_id, qty AS t2_qty, amount AS t2_amount FROM t_test_staging WHERE trade_date = '20250627' AND update_time <= '09:00:00' AND channel_id IN (100, 500, 1000, 5000) ORDER BY asset_id, seq_no DESC NULLS LAST) n USING (asset_id);

3.7 完整执行计划

id | operation | A-time | A-rows | E-rows | Peak Memory | A-width | E-width | E-costs ----+--------------------------------------------------------------------------------------+------------+--------------+--------+---------------+---------+---------+------------------------ 1 | -> Hash Left Join (2, 5) | 40774.030| 2355 | 1 | 24kB | 204 | | 1142568.006..1142570.563 2 | -> Unique | 33574.824| 2355 | 1 | 10kB | 182 | | 571282.729..571285.264 3 | -> Sort | 30952.123| 15189573| 507 | 104896kB | 182 | | 571282.729..571283.997 4 | -> Index Only Scan using idx_opt_001 on t_test_staging | 12362.086| 15189573| 507 | 112kB | 182 | | 0.000..571259.950 5 | -> Hash | 7125.826| 2355 | 1 | 427kB | 114 | | 571285.274..571285.274 6 | -> Subquery Scan on n | 7124.697| 2355 | 1 | 13kB | 114 | | 571282.729..571285.274 7 | -> Unique | 7123.976| 2355 | 1 | 10kB | 128 | | 571282.729..571285.264 8 | -> Sort | 7123.138| 2373 | 507 | 3kB | 128 | | 571282.729..571283.997 9 | -> Index Only Scan using idx_opt_001 on t_test_staging | 7112.291| 2373 | 507 | 112kB | 128 | | 0.000..571259.950 (9 rows) Predicate Information (identified by plan id) --------------------------------------------------------------------------------------------------------------------------- 1 --Hash Left Join (2, 5) Hash Cond: ((public.t_test_staging.asset_id)::text = (n.asset_id)::text), (Expression Flatten Optimized) 4 --Index Only Scan using idx_opt_001 on t_test_staging Filter: (((trade_date)::text = '20250627'::text) AND ((update_time)::text <= '15:00:00'::text) AND (channel_id = ANY ('{100,500,1000,5000}'::numeric[]))), (Expression Flatten Optimizd), (Predicates Reordered) Rows Removed by Filter: 8775 9 --Index Only Scan using idx_opt_001 on t_test_staging Filter: (((trade_date)::text = '20250627'::text) AND ((update_time)::text <= '09:00:00'::text) AND (channel_id = ANY ('{100,500,1000,5000}'::numeric[]))), (Expression Flatten Optimizd), (Predicates Reordered) Rows Removed by Filter: 15195975 (8 rows) Memory Information (identified by plan id) --------------------------------------------------------------------------------------------------------------------------- 3 --Sort Sort Method: external merge Disk: 804784kB Buckets: 32768 Batches: 1 Memory Usage: 124kB 8 --Sort Sort Method: quicksort Memory: 319kB (5 rows) ================ Query Summary ================ ----------------------------------------------- Datanode executor start time: 0.114 ms Datanode executor run time: 40775.241 ms Datanode executor end time: 0.047 ms Planner runtime: 0.770 ms Query Id: 1946962414165250608 Total runtime: 40775.440 ms (6 rows)

总耗时:40,775 ms(约 41 秒)

执行计划摘要
节点 操作 实际耗时 说明
1 Hash Left Join 40,774 ms 合并两个子查询结果
2-4 Unique/Sort/Index Only Scan(全日) 33,574 ms 覆盖索引扫描
5-9 Hash/Subquery/Unique/Sort/Index Only Scan(早盘) 7,125 ms 覆盖索引扫描
3.8 效果分析
  • 相比方案 0:提升 67%(从 123s → 41s)
  • Index Only Scan:完全通过索引获取数据,无需回表
  • 逻辑拆分:将两个时间段的查询拆为独立子查询,早盘查询过滤后仅 2,373 行
  • 局限性:仍然需要对全量数据进行 Sort(804MB 磁盘排序),未启用并行

方案 3:窗口函数 + 条件排名(探索性尝试)

3.9 SQL改写

SELECT asset_id, MAX(CASE WHEN rn_all = 1 THEN trade_date END) AS trade_date, MAX(CASE WHEN rn_all = 1 THEN amount END) AS amount, MAX(CASE WHEN rn_all = 1 THEN qty END) AS qty, MAX(CASE WHEN rn_all = 1 THEN oi_value END) AS oi_value, MAX(CASE WHEN rn_nine = 1 THEN qty END) AS t2_qty, MAX(CASE WHEN rn_nine = 1 THEN amount END) AS t2_amount FROM ( SELECT asset_id, trade_date, amount, qty, oi_value, -- 1 update_time<=15:00 seq_no KEEP ROW_NUMBER() OVER (PARTITION BY asset_id ORDER BY seq_no DESC NULLS LAST) AS rn_all, -- 2 update_time<=09:00 seq_no KEEPCASE ROW_NUMBER() OVER (PARTITION BY asset_id ORDER BY CASE WHEN update_time <= '09:00:00' THEN seq_no END DESC NULLS LAST) AS rn_nine FROM ( SELECT asset_id, trade_date, amount, qty, oi_value, seq_no, update_time FROM data_ops.t_market_data_detail WHERE trade_date = '20250627' AND update_time <= '15:00:00' AND channel_id IN (100, 500, 1000, 5000) ) base ) ranked GROUP BY asset_id;

3.10 完整执行计划

id | operation | A-time | A-rows | E-rows | Peak Memory | A-width | E-width | E-costs ----+------------------------+------------+--------------+--------------+-------------+---------+---------+------------------------ 1 | -> GroupAggregate | 163316.589| 2355 | 2356 | 60kB | 243 | | 6003628.088..6988339.898 2 | -> windowAgg | 155457.946| 15189573 | 15149050 | 32kB | 49 | | 6003628.088..6344481.713 3 | -> Sort | 144296.753| 15189573 | 15149050 | 6557kB | 49 | | 6003628.088..6041500.713 4 | -> windowAgg | 119080.339| 15189573 | 15149050 | 32kB | 49 | | 3158506.822..3499360.447 5 | -> Sort | 107443.791| 15189573 | 15149050 | 6557kB | 49 | | 3158506.822..3196379.447 6 | -> Seq Scan on t_market_data_detail | 21274.101| 15189573 | 15149050 | 4kB | 49 | | 0.000..654239.180 (6 rows) Predicate Information (identified by plan id) --------------------------------------------------------------------------------------------------------------------------- 6 --Seq Scan on t_market_data_detail Filter: (((update_time)::text <= '15:00:00'::text) AND ((trade_date)::text = '20250627'::text) AND (channel_id = ANY ('{100,500,1000,5000}'::numeric[]))), (Expression Flatten Optimizd), (Predicates Reordered) Rows Removed by Filter: 8775 (3 rows) Memory Information (identified by plan id) --------------------------------------------------------------------------------------------------------------------------- 3 --Sort Sort Method: external merge Disk: 965808kB 5 --Sort Sort Method: external merge Disk: 804920kB (4 rows) ================ Query Summary ================ ----------------------------------------------- Datanode executor start time: 0.170 ms Datanode executor run time: 163319.509 ms Datanode executor end time: 60.965 ms Planner runtime: 0.911 ms Query Id: 1946962414165494361 Total runtime: 163380.667 ms (6 rows)

总耗时:163,380 ms(约 163 秒) ⚠️ 性能最差

执行计划摘要
节点 操作 实际耗时
1 GroupAggregate 163,316 ms
2 WindowAgg 155,457 ms
3 Sort 144,296 ms
4 WindowAgg 119,080 ms
5 Sort 107,443 ms
6 Seq Scan 21,274 ms
3.11 失败原因分析
  • 双重窗口函数:需要两次独立排序,分别产生 965MB 和 804MB 的磁盘 I/O
  • 全量数据排名:对 1500 万行逐行计算 ROW_NUMBER,然后外层再聚合
  • 无并行、无索引:退化为最差的串行全表扫描模式
  • 结论:此方案不适用于大数据量场景,窗口函数在数据量过大时开销显著

方案 4:两阶段聚合 + 反向查找(最终最优方案)⭐

3.12 核心思路

"先找序列号,再查明细" —— 将问题分解为两个阶段:

  1. 第一阶段(聚合):按品种分组,直接计算出每个品种在两个时间段的最大 seq_no
  2. 第二阶段(回查):用计算出的 seq_no 精确匹配原表,获取完整行数据
3.13 SQL 实现

SELECT /*+set(query_dop 4)*/ asset_id, MAX(CASE WHEN data_type = 'GROUP_09' THEN amount END) AS amount_09, MAX(CASE WHEN data_type = 'GROUP_09' THEN qty END) AS qty_09, MAX(CASE WHEN data_type = 'GROUP_15' THEN trade_date END) AS trade_date_15, MAX(CASE WHEN data_type = 'GROUP_15' THEN amount END) AS amount_15, MAX(CASE WHEN data_type = 'GROUP_15' THEN qty END) AS qty_15 FROM ( SELECT o.asset_id, o.seq_no, o.trade_date, o.amount, o.qty, o.oi_value, CASE WHEN o.seq_no = kp.max_seq_no_09 THEN 'GROUP_09' WHEN o.seq_no = kp.max_seq_no_15 THEN 'GROUP_15' END AS data_type FROM ( -- 第一阶段:聚合出每个品种的最大 seq_no SELECT asset_id, MAX(CASE WHEN update_time <= '09:00:00' THEN seq_no END) AS max_seq_no_09, MAX(seq_no) AS max_seq_no_15 FROM data_ops.t_market_data_detail WHERE trade_date = '20250627' AND update_time <= '15:00:00' AND channel_id IN (1000, 5000, 100, 500) GROUP BY asset_id ) kp JOIN data_ops.t_market_data_detail o ON o.asset_id = kp.asset_id AND o.trade_date = '20250627' AND o.channel_id IN (1000, 5000, 100, 500) AND o.seq_no IN (kp.max_seq_no_09, kp.max_seq_no_15) ) TargetRows GROUP BY asset_id;

3.14 完整执行计划

id | operation | A-time | A-rows | E-rows | Peak Memory | A-width | E-width | E-costs ----+----------------------------------------------------------------+-------------------------------+-------------+---------+-----------------+-----------+---------+------------------------ 1 | -> Streaming(type: LOCAL GATHER dop: 1/4) | [18111.000,18111.000] | 2355 | 3 | [134kB,134kB] | | 297 | 482084.094..482084.278 2 | -> HashAggregate | [18099.159,18104.285] | 2355 | 3 | [377kB,393kB] | [27,27] | 297 | 482084.094..482084.102 3 | -> Streaming(type: LOCAL REDISTRIBUTE dop: 4/4) | [18093.913,18099.183] | 4710 | 3 | [136kB,136kB] | | 105 | 203747.245..482084.047 4 | -> Hash Join (5,6) | [17880.987,18086.743] | 4710 | 3 | [24kB,24kB] | | 105 | 203747.245..482083.833 5 | -> Seq Scan on t_market_data_detail o | [3808.056,3887.799] | 15191946 | 15196559| [41kB,41kB] | | 41 | 0.000..155058.039 6 | -> Hash | [8385.531,8404.325] | 9420 | 9392 | [419kB,419kB] | [39,39] | 75 | 203717.895..203717.895 7 | -> Streaming(type: BROADCAST dop: 4/4) | [8384.467,8403.213] | 9420 | 9392 | [123kB,123kB] | | 75 | 202593.157..203717.895 8 | -> Subquery Scan on kp | [8379.329,8386.988] | 2355 | 2348 | [5kB,5kB] | | 75 | 202593.157..202622.507 9 | -> HashAggregate | [8379.154,8386.834] | 2355 | 2348 | [178kB,182kB] | [27,27] | 89 | 202593.157..202599.027 10 | -> Streaming(type: LOCAL REDISTRIBUTE dop: 4/4) | [8376.756,8384.340] | 9420 | 2348 | [114kB,114kB] | | 89 | 202432.420..202587.287 11 | -> HashAggregate | [7870.510,8379.085] | 9420 | 2348 | [539kB,622kB] | [27,27] | 89 | 202432.420..202438.290 12 | -> Seq Scan on t_market_data_detail | [5167.157,5347.326] | 15189573 | 15149050| [37kB,37kB] | | 25 | 0.000..164559.795 (12 rows) Predicate Information (identified by plan id) --------------------------------------------------------------------------------------------------------------------------- 4 --Hash Join (5,6) Hash Cond: ((o.asset_id)::text = (kp.asset_id)::text), (Expression Flatten Optimized) Join Filter: (((o.seq_no = kp.max_seq_no_09) OR (o.seq_no = kp.max_seq_no_15)), (Expression Flatten Optimized) Rows Removed by Join Filter: 15187236 5 --Seq Scan on t_market_data_detail o Filter: (((trade_date)::text = '20250627'::text) AND (channel_id = ANY ('{1000,5000,100,500}'::numeric[]))), (Expression Flatten Optimized), (Predicates Reordered) Rows Removed by Filter: 6402 12 --Seq Scan on t_market_data_detail Filter: (((update_time)::text <= '15:00:00'::text) AND ((trade_date)::text = '20250627'::text) AND (channel_id = ANY ('{1000,5000,100,500}'::numeric[]))), (Expression Flatten Optimiz d), (Predicates Reordered) Rows Removed by Filter: 8775 (10 rows) Memory Information (identified by plan id) --------------------------------------------------------------------------------------------------------------------------- 6 --Hash Max Buckets: 32768 Max Batches: 1 Max Memory Usage: 129kB Min Buckets: 32768 Min Batches: 1 Min Memory Usage: 129kB (3 rows) User Define Profiling --------------------------------------------------------------------------------------------------------------------------- Segment Id: 1 Track name: Datanode build connection (actual time=[0.000, 0.000], calls=[1, 1]) Plan Node id: 1 Track name: Datanode start up stream thread (actual time=[1.292, 1.292], calls=[1, 1])) (4 rows) ================ Query Summary ================ ----------------------------------------------- Datanode executor start time: 2.436 ms Datanode executor run time: 18111.228 ms Datanode executor end time: 0.118 ms Planner runtime: 5.341 ms Query Id: 194696241416526033 Total runtime: 18113.822 ms (6 rows)

总耗时:18,113 ms(约 18 秒)

执行计划摘要
节点 操作 实际耗时 说明
1 Streaming (LOCAL GATHER) 18,111 ms 并行汇聚
2 HashAggregate 18,099~18,104 ms 最终聚合
3 Streaming (LOCAL REDISTRIBUTE) 18,093~18,099 ms 数据重分布
4 Hash Join 17,880~18,086 ms 两阶段 JOIN
5 Seq Scan (o) 3,808~3,887 ms 明细表并行扫描
6-12 Hash/Broadcast/Subquery/HashAggregate 5,167~8,404 ms 聚合阶段并行执行
3.15 效果分析
对比项 方案 0 方案 1 方案 2 方案 3 方案 4
总耗时 123,543 ms 43,145 ms 40,775 ms 163,380 ms 18,113 ms
相对方案 0 100% 34.9% 33.0% 132.3% 14.7%
性能提升 - 65.1% 67.0% -32.3% 85.3%
并行度 DOP=4 DOP=4
排序方式 外部归并(938MB) 外部归并(231MB×4) 外部归并(804MB) 外部归并(1.7GB) HashAggregate
索引使用 Index Only 无需索引

四、优化策略总结

4.1 关键优化点

(1)避免大排序 —— 用 HashAggregate 替代 Sort
方案 0/1/2/3:扫描 → 排序(磁盘) → 聚合    【瓶颈在排序】
方案 4:      扫描 → HashAggregate → JOIN  【无大排序】

方案 4 的核心突破在于:第一阶段直接用 MAX() 聚合找到目标 seq_no,完全避免了排序操作。HashAggregate 的内存开销仅为几百 KB,远小于磁盘排序的数百 MB。

(2)减少数据流转量
方案 排序/聚合的数据量 最终输出
0/1/3 15,189,573 行(全量) 2,355 行
2 15,189,573 + 2,373 行 2,355 行
4 2,355 行(聚合后)+ 最多 4,710 行(回查) 2,355 行

方案 4 在第一阶段就将 1500 万行压缩为 2,355 个品种的最大 seq_no,第二阶段仅用这些 seq_no 精确回查,JOIN 后最多返回约 4,710 行。

(3)并行计算加速

方案 4 充分利用 GaussDB 的SMP能力:

  • LOCAL REDISTRIBUTE:将数据均匀分发到 4 个线程
  • BROADCAST:将小的聚合结果广播到所有线程
  • LOCAL GATHER:汇聚各线程的局部结果
(4)单次表扫描的设计哲学

虽然方案 4 在逻辑上"读了两遍表"(一次聚合 + 一次回查),但两遍扫描是并行分布在不同线程执行的,且第二遍扫描通过 Hash Join 的过滤条件大幅减少了有效数据量。相比方案 0/1 中排序带来的磁盘 I/O,两次内存友好的扫描反而更高效。

4.2 优化路径总结


初始状态:KEEP 聚合 + 全表扫描 + 大排序                      
 耗时: 123s                                                  
                          ↓                                 
 第一步:启用并行 (query_dop=4)                              
 效果: 123s → 43s(65% 提升)                                
 分析: 并行有效但未解决排序瓶颈                               
                         ↓                                  
第二步:DISTINCT ON + 覆盖索引(探索方向)                   
效果: 123s → 41s(67% 提升)                                
分析: Index Only Scan 有效,但仍需大排序                     
                         ↓                                  
第三步:窗口函数(反面教材)                                 
效果: 123s → 163s(性能下降!)                              
分析: 双重窗口 + 双重排序 = 灾难                             
                        ↓                                 
第四步:两阶段聚合 + 反向查找 ⭐                             
效果: 123s → 18s(85% 提升)                                 
分析: HashAggregate 替代排序 + 并行 = 最优解                

4.3 通用优化建议

  1. 优先消除排序:对于 “取每组最大值对应行” 的场景,先用 MAX()/MIN() 聚合出目标键值,再回查明细,比直接排序更高效
  2. KEEP/FIRST_VALUE 类函数的替代方案:当数据量大时,考虑用"先聚合后 JOIN"的模式替代
  3. 并行是基础优化:在 GaussDB 全表扫描性能相比ORACLE差很多, query_dop 是最直接的优化手段
  4. 覆盖索引有前提:Index Only Scan 虽好,但如果查询仍需大排序,收益有限;且维护索引有额外成本
  5. 窗口函数需谨慎:窗口函数在数据量超过内存容量时会产生大量磁盘 I/O,不适合作为大数据量的首选方案
  6. 通过执行计划验证假设:每次改写后务必用 EXPLAIN ANALYZE 验证实际效果,避免"理论上更好"但实际更差的情况
Logo

华为开发者空间,是为全球开发者打造的专属开发空间,汇聚了华为优质开发资源及工具,致力于让每一位开发者拥有一台云主机,基于华为根生态开发、创新。

更多推荐