status状态字段是否要创建索引？

前段时间在公司工作中我建议创建的某个表字段索引在内部评审的时候被删除了，理由是”状态枚举值少的索引效果不好，甚至会引发全表扫描，例如性别男女“。但我觉得，只查未发送的记录，取结果较少，索引效率会高。虽然不是什么大事，直到现在也没有什么明显影响，但这个事情给我心底留下了疑问，“状态枚举字段是否应该建立索引？”

豁达风筝

8425人浏览 · 2021-04-23 17:44:30

豁达风筝 · 2021-04-23 17:44:30 发布

status状态字段是否要创建索引？

背景

前段时间在公司工作中我建议创建的某个表字段索引在内部评审的时候被删除了，理由是”状态枚举值少的索引效果不好，甚至会引发全表扫描，例如性别男女“。虽然不是什么大事，直到现在也没有什么明显影响，但这个事情给我心底留下了疑问，“状态枚举字段是否应该建立索引？”。

业务场景：商品报名业务，报名以后保存了其中一个表，状态为“待发送”。另外有个界面可查询此状态数据进行手动操作调用下游接口完成相应业务操作，或者由定时任务定期扫描待发送数据自动完成。send_flag字段0-未发送，1-已发送。send_flag=0的数据每天小于500条，表中记录约30万条记录，表在以每个月5万条记录增长。（数据库是mysql，使用的是Innodb引擎）

相信这种类似业务在平时工作中经常会遇到。比如说状态status(0\1\2\3)、性别sex(0未知\1男\2女)等。你在遇到这类场景的时候，是否会给"send_flag"创建索引？为什么很多人会说可能会引发全表扫描？

准备数据

为了测试，我在本地创建了一个表test_product，插入了150万条记录，开启了慢查询时间为10毫秒。除自增ID以外，无索引。

验证以下几个问题

加索引以后是否会提升查询效率？

为了模拟业务场景，在150万条记录中我设置500条记录的send_flag=0，设置send_flag的记录是随机选的。

CREATE PROCEDURE test.test()
begin
	DECLARE i INT;# 申明变量
SET i = 0;  # 变量赋值
WHILE i<500 DO # 结束循环的条件
update test_product set send_flag = 0 where id = (select ceiling(rand()*1500000) from dual);
SET i = i+1;    # 循环一次,i加1
END WHILE;  # 结束while循环
#结束循环执行语句
end

因为自己PC配置问题，执行时间较长。

send_flag未添加索引的查询情况

执行语句

mysql> select count(1) from test_product where send_flag=0;
+----------+
| count(1) |
+----------+
|      503 |
+----------+
1 row in set (2.26 sec)

mysql>

通过上述可以看到，大约在2秒。同样的语句我执行5次后看下慢查询

mysql> select start_time, query_time ,sql_text from mysql.slow_log order by start_time desc limit 5;
+----------------------------+-----------------+-----------------------------------------------------+
| start_time                 | query_time      | sql_text                                            |
+----------------------------+-----------------+-----------------------------------------------------+
| 2021-04-23 16:08:54.139661 | 00:00:02.260723 | select count(1) from test_product where send_flag=0 |
| 2021-04-23 16:08:51.221102 | 00:00:02.257314 | select count(1) from test_product where send_flag=0 |
| 2021-04-23 16:08:48.203821 | 00:00:02.361179 | select count(1) from test_product where send_flag=0 |
| 2021-04-23 16:08:45.247554 | 00:00:02.311678 | select count(1) from test_product where send_flag=0 |
| 2021-04-23 16:08:42.210513 | 00:00:02.402186 | select count(1) from test_product where send_flag=0 |
+----------------------------+-----------------+-----------------------------------------------------+
5 rows in set (0.00 sec)

mysql>

可以看到每次的执行时间都在2秒+。

为了看出效果，此处我执行一个无效语句用于分隔查询结果select sleep(5)

send_flag添加索引的查询情况

添加索引语句

CREATE INDEX test_product_send_flag_IDX USING BTREE ON test.test_product (send_flag);

同样执行语句

mysql> select count(1) from test_product where send_flag=0;
+----------+
| count(1) |
+----------+
|      503 |
+----------+
1 row in set (0.00 sec)

mysql>

执行了5次，结果都一样查询时间都是0秒。

再看下慢查询

mysql> select start_time, query_time ,sql_text from mysql.slow_log order by start_time desc limit 5;
+----------------------------+-----------------+-----------------------------------------------------+
| start_time                 | query_time      | sql_text                                            |
+----------------------------+-----------------+-----------------------------------------------------+
| 2021-04-23 16:13:39.746433 | 00:00:05.001184 | select sleep(5)                                     |
| 2021-04-23 16:08:54.139661 | 00:00:02.260723 | select count(1) from test_product where send_flag=0 |
| 2021-04-23 16:08:51.221102 | 00:00:02.257314 | select count(1) from test_product where send_flag=0 |
| 2021-04-23 16:08:48.203821 | 00:00:02.361179 | select count(1) from test_product where send_flag=0 |
| 2021-04-23 16:08:45.247554 | 00:00:02.311678 | select count(1) from test_product where send_flag=0 |
+----------------------------+-----------------+-----------------------------------------------------+
5 rows in set (0.00 sec)

mysql>

可以看到，慢查询日志最后一条语句还是我们刚才执行的sleep(5)分隔，并没有刚才执行的几条语句。

没加索引的查询耗时2s，加上索引后查询0s。由此可以说明：为send_flag创建索引能有效的提升查询效率！

但是，结论下的有点早！！！，我们接着往下看：

枚举值较少时是否会引发全表扫描？

以我的理解，应该分3种情况来验证：1）枚举值对应的数据量有较大差别，查询较少的这部分数据；2）枚举值对应的数据量有较大差别，查询较多的这部分数据；3）各类枚举值对应的数据量比较接近时查询；

接下来，我们先验证

数据量有较大差别，查询较少的这部分数据

验证仍然使用上面的测试数据，总记录150万，send_flag=0有500条，剩下全部是send_flag=1。

执行explain分析

mysql> select count(1) from test_product where send_flag=0;
+----------+
| count(1) |
+----------+
|      503 |
+----------+
1 row in set (0.00 sec)

mysql> explain select count(1) from test_product where send_flag=0;
+----+-------------+--------------+------------+------+----------------------------+----------------------------+---------+-------+------+----------+-------------+
| id | select_type | table        | partitions | type | possible_keys              | key                        | key_len | ref   | rows | filtered | Extra       |
+----+-------------+--------------+------------+------+----------------------------+----------------------------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | test_product | NULL       | ref  | test_product_send_flag_IDX | test_product_send_flag_IDX | 5       | const |  503 |   100.00 | Using index |
+----+-------------+--------------+------------+------+----------------------------+----------------------------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql>

不出意外，此时肯定是在使用索引test_product_send_flag_IDX且查询类型type=ref。

因此，各枚举值对应的数据量有较大差别，查询较少的这部分数据是会使用索引的。

数据量有较大差别，查询较多的这部分数据

验证仍然使用上面的测试数据，总记录150万，send_flag=0有500条，剩下全部是send_flag=1。

执行explain分析

mysql> select count(1) from test_product where send_flag=1;
+----------+
| count(1) |
+----------+
|  1521367 |
+----------+
1 row in set (0.37 sec)

mysql> explain select count(1) from test_product where send_flag=1;
+----+-------------+--------------+------------+------+----------------------------+----------------------------+---------+-------+--------+----------+-------------+
| id | select_type | table        | partitions | type | possible_keys              | key                        | key_len | ref   | rows   | filtered | Extra       |
+----+-------------+--------------+------------+------+----------------------------+----------------------------+---------+-------+--------+----------+-------------+
|  1 | SIMPLE      | test_product | NULL       | ref  | test_product_send_flag_IDX | test_product_send_flag_IDX | 5       | const | 866953 |   100.00 | Using index |
+----+-------------+--------------+------------+------+----------------------------+----------------------------+---------+-------+--------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql>

此时可以看到查询执行相对比上一个示例长了一些，但是仍然在使用索引test_product_send_flag_IDX查询类型是type=ref。

因此，各枚举值对应的数据量有较大差别，查询较多的这部分数据也是会使用索引的。

各类枚举值对应的数据量比较接近时查询

此时我们需要修改表中的数据，使send_flag=0和send_flag=1的数据差不多。

执行explain分析

mysql> select count(1) from test_product where send_flag=1;
+----------+
| count(1) |
+----------+
|   760672 |
+----------+
1 row in set (0.20 sec)

mysql> select count(1) from test_product where send_flag=0;
+----------+
| count(1) |
+----------+
|   761198 |
+----------+
1 row in set (0.20 sec)

mysql> explain select count(1) from test_product where send_flag=1;
+----+-------------+--------------+------------+------+----------------------------+----------------------------+---------+-------+--------+----------+-------------+
| id | select_type | table        | partitions | type | possible_keys              | key                        | key_len | ref   | rows   | filtered | Extra       |
+----+-------------+--------------+------------+------+----------------------------+----------------------------+---------+-------+--------+----------+-------------+
|  1 | SIMPLE      | test_product | NULL       | ref  | test_product_send_flag_IDX | test_product_send_flag_IDX | 5       | const | 752050 |   100.00 | Using index |
+----+-------------+--------------+------------+------+----------------------------+----------------------------+---------+-------+--------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql>

可以不管是查询send_flag=0还是1，查询耗时都差不多，也都使用了索引。

因此，当各类枚举值对应的数据量比较接近时，查询也是会使用索引的。

为什么反对在status这类字段上加索引呢？

通过上面几个实验，我们可以看到索引对查询send_flag的效率有非常明显的提升。那为什么会有很多人反对在status这类字段上增加索引呢？我们再来做几个试验证。

索引各类枚举值对应的数据量比较接近时，带条件查询

此时我们接着上面的数据（send_flag=0和1的数据相差不大）测试，但这次不同的是，除了索引字段我加入了其他过滤条件。

mysql> select count(1) from test_product where send_flag=1 and commodity_code = '1018223171';
+----------+
| count(1) |
+----------+
|        1 |
+----------+
1 row in set (4.00 sec)

慢查询

mysql>  select start_time, query_time ,sql_text from mysql.slow_log order by start_time desc limit 5;
+----------------------------+-----------------+---------------------------------------------------------------------------------------+
| start_time                 | query_time      | sql_text                                                                              |
+----------------------------+-----------------+---------------------------------------------------------------------------------------+
| 2021-04-25 11:10:56.584538 | 00:00:03.861784 | select count(1) from test_product where send_flag=1 and commodity_code = '1018223171' |
| 2021-04-25 11:10:44.473629 | 00:00:03.987070 | select count(1) from test_product where send_flag=1 and commodity_code = '1018223171' |
| 2021-04-25 11:10:38.429774 | 00:00:03.932597 | select count(1) from test_product where send_flag=1 and commodity_code = '1018223171' |
| 2021-04-25 11:10:31.988441 | 00:00:03.855316 | select count(1) from test_product where send_flag=1 and commodity_code = '1018223171' |
| 2021-04-25 11:10:26.854730 | 00:00:04.032600 | select count(1) from test_product where send_flag=1 and commodity_code = '1018223171' |
+----------------------------+-----------------+---------------------------------------------------------------------------------------+
5 rows in set (0.00 sec)

mysql>

我们看到执行已经为4秒左右，这样的效果已经很差了，我们来看看是否使用了索引？

explain

mysql> explain select count(1) from test_product where send_flag=1 and commodity_code = '1018223171';
+----+-------------+--------------+------------+------+----------------------------+----------------------------+---------+-------+--------+----------+-------------+
| id | select_type | table        | partitions | type | possible_keys              | key                        | key_len | ref   | rows   | filtered | Extra       |
+----+-------------+--------------+------------+------+----------------------------+----------------------------+---------+-------+--------+----------+-------------+
|  1 | SIMPLE      | test_product | NULL       | ref  | test_product_send_flag_IDX | test_product_send_flag_IDX | 5       | const | 752050 |    10.00 | Using where |
+----+-------------+--------------+------------+------+----------------------------+----------------------------+---------+-------+--------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql>

我们可以看到这里的这里仍然使用到了索引test_product_send_flag_IDX查询类型type=ref。但是查询耗时却很多。

无索引各类枚举值对应的数据量比较接近时，带条件查询

删除掉索引字段后，同样的语句我们再来执行一下。

mysql> select count(1) from test_product where send_flag=1 and commodity_code = '1018223171';
+----------+
| count(1) |
+----------+
|        1 |
+----------+
1 row in set (2.82 sec)

慢查询（执行5次）

mysql>  select start_time, query_time ,sql_text from mysql.slow_log order by start_time desc limit 5;
+----------------------------+-----------------+---------------------------------------------------------------------------------------+
| start_time                 | query_time      | sql_text                                                                              |
+----------------------------+-----------------+---------------------------------------------------------------------------------------+
| 2021-04-25 14:26:14.341608 | 00:00:02.635658 | select count(1) from test_product where send_flag=1 and commodity_code = '1018223171' |
| 2021-04-25 14:26:10.999191 | 00:00:02.816881 | select count(1) from test_product where send_flag=1 and commodity_code = '1018223171' |
| 2021-04-25 14:26:07.321773 | 00:00:02.844335 | select count(1) from test_product where send_flag=1 and commodity_code = '1018223171' |
| 2021-04-25 14:25:49.832506 | 00:00:02.821655 | select count(1) from test_product where send_flag=1 and commodity_code = '1018223171' |
| 2021-04-25 14:24:51.352390 | 00:00:02.982645 | select count(1) from test_product where send_flag=1 and commodity_code = '1018223171' |
+----------------------------+-----------------+---------------------------------------------------------------------------------------+
5 rows in set (0.00 sec)

有没有发现很奇怪的现象，这次执行是2.8秒，同样的语句删除掉索引以后反而比加了索引查询时间少了很多。

explain

mysql> explain select count(1) from test_product where send_flag=1 and commodity_code = '1018223171';
+----+-------------+--------------+------------+------+---------------+------+---------+------+---------+----------+-------------+
| id | select_type | table        | partitions | type | possible_keys | key  | key_len | ref  | rows    | filtered | Extra       |
+----+-------------+--------------+------------+------+---------------+------+---------+------+---------+----------+-------------+
|  1 | SIMPLE      | test_product | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 1504101 |     1.00 | Using where |
+----+-------------+--------------+------------+------+---------------+------+---------+------+---------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

可以看到执行语句并没有任何的索引，使用的是全表扫描type=ALL。但事实情况就是这样，没有索引反而比有索引的时候要快1.2秒左右。

我们再看下数据相差较大时的情况。

有索引枚举字段相差较大时查询带条件

此时我对数据进行了修改，使send_flag=0只有343条，而send_flag=1有150万。send_flag有索引。

执行查询较多部分数据（send_flag=1）

mysql> select count(1) from test_product where send_flag=1 and commodity_code = '1018223171';
+----------+
| count(1) |
+----------+
|        1 |
+----------+
1 row in set (10.54 sec)

查询耗时10秒

explain

mysql> explain select count(1) from test_product where send_flag=1 and commodity_code = '1018223171';
+----+-------------+--------------+------------+------+----------------------------+----------------------------+---------+-------+--------+----------+-------------+
| id | select_type | table        | partitions | type | possible_keys              | key                        | key_len | ref   | rows   | filtered | Extra       |
+----+-------------+--------------+------------+------+----------------------------+----------------------------+---------+-------+--------+----------+-------------+
|  1 | SIMPLE      | test_product | NULL       | ref  | test_product_send_flag_IDX | test_product_send_flag_IDX | 5       | const | 752050 |    10.00 | Using where |
+----+-------------+--------------+------------+------+----------------------------+----------------------------+---------+-------+--------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

查询较多的这部分数据也是使用了索引的，但耗时长达10秒。

执行查询较多部分数据（send_flag=0）

mysql> select count(1) from test_product where send_flag=0 and commodity_code = '108023319';
+----------+
| count(1) |
+----------+
|        1 |
+----------+
1 row in set (0.36 sec)

查询耗时0.3秒左右

explain

mysql> explain select count(1) from test_product where send_flag=0 and commodity_code = '108023319';
+----+-------------+--------------+------------+------+----------------------------+----------------------------+---------+-------+------+----------+-------------+
| id | select_type | table        | partitions | type | possible_keys              | key                        | key_len | ref   | rows | filtered | Extra       |
+----+-------------+--------------+------------+------+----------------------------+----------------------------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | test_product | NULL       | ref  | test_product_send_flag_IDX | test_product_send_flag_IDX | 5       | const |  343 |    10.00 | Using where |
+----+-------------+--------------+------------+------+----------------------------+----------------------------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

可以看到查询较少的这部分数据时孔在用索引，但耗时较少。

而如果没有索引时不管是查询send_flag=1或者send_flag=0查询耗时都比较平均

mysql> select count(1) from test_product where send_flag=1 and commodity_code = '1018223171';
+----------+
| count(1) |
+----------+
|        1 |
+----------+
1 row in set (2.93 sec)

mysql> select count(1) from test_product where send_flag=0 and commodity_code = '108023319';
+----------+
| count(1) |
+----------+
|        1 |
+----------+
1 row in set (2.75 sec)

实验总结

我们将上述的实验数据整理出来

只查询索引字段条件

send_flag索引	查询条件	send_flag数量	耗时/秒	索引/全表
无	send_flag=0	503	2.3	全表
有	send_flag=1	503	0	索引
有	send_flag=1	150万	0.37	索引
有	send_flag=1	75万	0.2	索引

查询索引字段+其他非索引条件

send_flag索引	查询条件	send_flag数量	耗时/秒	索引/全表
有	send_flag=1 and commodity_code = ‘1018223171’	75万	4	索引
无	send_flag=1 and commodity_code = ‘1018223171’	75万	2.8	全表
有	send_flag=1 and commodity_code = ‘1018223171’	150万	10	索引
有	send_flag=0 and commodity_code = ‘108023319’	343	<0.5	索引
无	send_flag=1 and commodity_code = ‘1018223171’	150万	2.9	全表
无	send_flag=0 and commodity_code = ‘108023319’	343	2.7	全表