group by分组排序(组内排序)
一、项目场景:有张调用记录表有多个客户端的调用记录,每条调用记录有当次调用记录的详细信息分页列表需要查出 每个客户端的调用次数、以及最新一次调用的信息。这里举个栗子来模拟一下业务,创建一张评论表,里面存了每个用户的多个评论信息。SQL如下CREATE TABLE `user_comment` (`id` int(10) unsigned NOT NULL AUTO_INCREMENT,`user_
一、项目场景:
有张调用记录表有多个客户端的调用记录,每条调用记录有当次调用记录的详细信息
分页列表需要查出 每个客户端的调用次数、以及最新一次调用的信息。
这里举个栗子来模拟一下业务,创建一张评论表,里面存了每个用户的多个评论信息。SQL如下
CREATE TABLE `user_comment` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`user_id` int(10) unsigned NOT NULL,
`contents` varchar(200) NOT NULL,
`create_time` datetime NOT NULL,
`update_time` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`),
KEY `addtime` (`create_time`),
KEY `uid_addtime` (`user_id`,`create_time`)
) ENGINE=InnoDB AUTO_INCREMENT=12 DEFAULT CHARSET=utf8;
插入数据
INSERT INTO `test`.`user_comment`(`id`, `user_id`, `contents`, `create_time`, `update_time`) VALUES (1, 1, '评论1', '2017-05-17 00:00:00', '2017-05-17 00:00:00');
INSERT INTO `test`.`user_comment`(`id`, `user_id`, `contents`, `create_time`, `update_time`) VALUES (2, 1, '评论2', '2017-05-17 00:00:01', '2017-05-17 00:00:01');
INSERT INTO `test`.`user_comment`(`id`, `user_id`, `contents`, `create_time`, `update_time`) VALUES (3, 2, '评论1', '2017-05-17 00:00:02', '2017-05-17 00:00:02');
INSERT INTO `test`.`user_comment`(`id`, `user_id`, `contents`, `create_time`, `update_time`) VALUES (4, 2, '评论2', '2017-05-17 00:00:03', '2017-05-17 00:00:03');
INSERT INTO `test`.`user_comment`(`id`, `user_id`, `contents`, `create_time`, `update_time`) VALUES (5, 3, '评论1', '2017-05-17 00:00:04', '2017-05-17 00:00:04');
INSERT INTO `test`.`user_comment`(`id`, `user_id`, `contents`, `create_time`, `update_time`) VALUES (6, 1, '评论3', '2017-05-17 00:00:05', '2017-05-17 00:00:05');
INSERT INTO `test`.`user_comment`(`id`, `user_id`, `contents`, `create_time`, `update_time`) VALUES (7, 4, '评论1', '2017-05-17 00:00:06', '2017-05-17 00:00:06');
INSERT INTO `test`.`user_comment`(`id`, `user_id`, `contents`, `create_time`, `update_time`) VALUES (8, 4, '评论2', '2017-05-17 00:00:07', '2017-05-17 00:00:07');
INSERT INTO `test`.`user_comment`(`id`, `user_id`, `contents`, `create_time`, `update_time`) VALUES (9, 4, '评论3', '2017-05-17 00:00:08', '2017-05-17 00:00:08');
INSERT INTO `test`.`user_comment`(`id`, `user_id`, `contents`, `create_time`, `update_time`) VALUES (10, 4, '评论4', '2017-05-17 00:00:09', '2017-05-17 00:00:09');
INSERT INTO `test`.`user_comment`(`id`, `user_id`, `contents`, `create_time`, `update_time`) VALUES (11, 3, '评论2', '2017-05-17 00:00:10', '2017-05-17 00:00:10');
查看数据结果
mysql> select * from user_comment;
+----+---------+----------+---------------------+---------------------+
| id | user_id | contents | create_time | update_time |
+----+---------+----------+---------------------+---------------------+
| 1 | 1 | 评论1 | 2017-05-17 00:00:00 | 2017-05-17 00:00:00 |
| 2 | 1 | 评论2 | 2017-05-17 00:00:01 | 2017-05-17 00:00:01 |
| 3 | 2 | 评论1 | 2017-05-17 00:00:02 | 2017-05-17 00:00:02 |
| 4 | 2 | 评论2 | 2017-05-17 00:00:03 | 2017-05-17 00:00:03 |
| 5 | 3 | 评论1 | 2017-05-17 00:00:04 | 2017-05-17 00:00:04 |
| 6 | 1 | 评论3 | 2017-05-17 00:00:05 | 2017-05-17 00:00:05 |
| 7 | 4 | 评论1 | 2017-05-17 00:00:06 | 2017-05-17 00:00:06 |
| 8 | 4 | 评论2 | 2017-05-17 00:00:07 | 2017-05-17 00:00:07 |
| 9 | 4 | 评论3 | 2017-05-17 00:00:08 | 2017-05-17 00:00:08 |
| 10 | 4 | 评论4 | 2017-05-17 00:00:09 | 2017-05-17 00:00:09 |
| 11 | 3 | 评论2 | 2017-05-17 00:00:10 | 2017-05-17 00:00:10 |
+----+---------+----------+---------------------+---------------------+
11 rows in set (0.05 sec)
二、问题描述:
现在要查询的是,每个用户的最新一条评论信息、以及统计用户总共评论了多少次。
这里很容易想到用
group by
去对user_id
进行分组,得出总共有多少个用户评论,但是此时,非group by
字段的选择是怎么样的呢?
我的理解是,group by 对非分组的列,选择是最先遇到的一条记录。即每个分组里 id 最小的。所以,如果我们要拿到每个分组里最新的一条评论信息,则在分组前对评论进行排序即可,
SQL如下
SELECT
c.*,
count( c.user_id ) AS comment_count
FROM
( SELECT * FROM user_comment ORDER BY create_time DESC ) c
GROUP BY
c.user_id;
在
MYSQL 5.6
版本下执行的结果
mysql> select c.*, count(c.user_id) as comment_count from (select * from user_comment order by create_time desc )c group by c.user_id;
+----+---------+----------+---------------------+---------------------+---------------+
| id | user_id | contents | create_time | update_time | comment_count |
+----+---------+----------+---------------------+---------------------+---------------+
| 6 | 1 | 评论3 | 2017-05-17 00:00:05 | 2017-05-17 00:00:05 | 3 |
| 4 | 2 | 评论2 | 2017-05-17 00:00:03 | 2017-05-17 00:00:03 | 2 |
| 11 | 3 | 评论2 | 2017-05-17 00:00:10 | 2017-05-17 00:00:10 | 2 |
| 10 | 4 | 评论4 | 2017-05-17 00:00:09 | 2017-05-17 00:00:09 | 4 |
+----+---------+----------+---------------------+---------------------+---------------+
4 rows in set (0.08 sec)
在
MYSQL 5.7
版本下执行的结果
mysql> select c.*, count(c.user_id) as comment_count from (select * from user_comment order by create_time desc )c group by c.user_id;
+----+---------+----------+---------------------+---------------------+---------------+
| id | user_id | contents | create_time | update_time | comment_count |
+----+---------+----------+---------------------+---------------------+---------------+
| 1 | 1 | 评论1 | 2017-05-17 00:00:00 | 2017-05-17 00:00:00 | 3 |
| 3 | 2 | 评论1 | 2017-05-17 00:00:02 | 2017-05-17 00:00:02 | 2 |
| 5 | 3 | 评论1 | 2017-05-17 00:00:04 | 2017-05-17 00:00:04 | 2 |
| 7 | 4 | 评论1 | 2017-05-17 00:00:06 | 2017-05-17 00:00:06 | 4 |
+----+---------+----------+---------------------+---------------------+---------------+
4 rows in set (0.03 sec)
这里就踩了个坑,在 GROUP BY
实现组内排序的时候,发现同一条SQL 在不同的 MYSQL版本下会查询出不同的数据。本次测试版本为 MYSQL 5.6
和 MYSQL 5 .7
,结果刚好相反。在5.6下是可以拿到正确结果的,而更高版本不行,所以要换一种实现方式。
三、原因分析:
MYSQL认为 group by是不稳定的,所以要消除这种不稳定性,要明确聚合分组的列。在MYSQL 5.7 版本对 group by 做了调整。详情这里不阐述,移步百度自行搜索。
四、解决方案:
注:网上有网友提出,在上面sql的基础上,在内循环加上 limit 即可解决,
我个人认为这样有点鸡肋,因为不知道表的数据有多少。
网友这种方式实现如下:
SELECT
c.*,
count( c.user_id ) AS comment_count
FROM
( SELECT * FROM user_comment ORDER BY create_time DESC LIMIT 1000 ) c
GROUP BY
c.user_id;
mysql> select c.*, count(c.user_id) as comment_count from (select * from user_comment order by create_time desc limit 1000 )c group by c.user_id;
+----+---------+----------+---------------------+---------------------+---------------+
| id | user_id | contents | create_time | update_time | comment_count |
+----+---------+----------+---------------------+---------------------+---------------+
| 6 | 1 | 评论3 | 2017-05-17 00:00:05 | 2017-05-17 00:00:05 | 3 |
| 4 | 2 | 评论2 | 2017-05-17 00:00:03 | 2017-05-17 00:00:03 | 2 |
| 11 | 3 | 评论2 | 2017-05-17 00:00:10 | 2017-05-17 00:00:10 | 2 |
| 10 | 4 | 评论4 | 2017-05-17 00:00:09 | 2017-05-17 00:00:09 | 4 |
+----+---------+----------+---------------------+---------------------+---------------+
4 rows in set (0.04 sec)
如上可以看到得到了我们想要的结果,也和MYSQL 5.7执行的一致。但是这种方式不太行,我们另寻途径。
1)组内排序的另一种实现方式
如果是自增id的话,id最大,则记录最新。如果不是自增,则可以用创建时间来区分。巧用 Max(xxx)函数来实现组内排序。
SQL如下
SELECT
*
FROM
user_comment
WHERE
id IN ( SELECT MAX( id ) FROM user_comment GROUP BY user_id )
ORDER BY
user_id
效果和上述方式一致,也可以达到组内排序的效果。
+----+---------+----------+---------------------+---------------------+
| id | user_id | contents | create_time | update_time |
+----+---------+----------+---------------------+---------------------+
| 6 | 1 | 评论3 | 2017-05-17 00:00:05 | 2017-05-17 00:00:05 |
| 4 | 2 | 评论2 | 2017-05-17 00:00:03 | 2017-05-17 00:00:03 |
| 11 | 3 | 评论2 | 2017-05-17 00:00:10 | 2017-05-17 00:00:10 |
| 10 | 4 | 评论4 | 2017-05-17 00:00:09 | 2017-05-17 00:00:09 |
+----+---------+----------+---------------------+---------------------+
4 rows in set (0.04 sec)
注意
: 写到这里就达到组内排序的目标了。但是还没达到我的要求。因为就是这样处理后,没法再做统计汇总了,所以要达到目的还是要改写。
最终SQL如下:
SELECT
c2.*,
COUNT( c1.user_id ) AS comment_count
FROM
user_comment c1
RIGHT JOIN ( SELECT * FROM user_comment WHERE id IN ( SELECT MAX( id ) FROM user_comment GROUP BY user_id ) ) c2 ON c1.user_id = c2.user_id
GROUP BY
c2.user_id
查询结果
mysql> SELECT
c2.*,
COUNT( c1.user_id ) AS comment_count
FROM
user_comment c1
RIGHT JOIN ( SELECT * FROM user_comment WHERE id IN ( SELECT MAX( id ) FROM user_comment GROUP BY user_id ) ) c2 ON c1.user_id = c2.user_id
GROUP BY
c2.user_id;
+----+---------+----------+---------------------+---------------------+---------------+
| id | user_id | contents | create_time | update_time | comment_count |
+----+---------+----------+---------------------+---------------------+---------------+
| 6 | 1 | 评论3 | 2017-05-17 00:00:05 | 2017-05-17 00:00:05 | 3 |
| 4 | 2 | 评论2 | 2017-05-17 00:00:03 | 2017-05-17 00:00:03 | 2 |
| 11 | 3 | 评论2 | 2017-05-17 00:00:10 | 2017-05-17 00:00:10 | 2 |
| 10 | 4 | 评论4 | 2017-05-17 00:00:09 | 2017-05-17 00:00:09 | 4 |
+----+---------+----------+---------------------+---------------------+---------------+
4 rows in set (0.07 sec)
码字不易,如有帮助点个赞趴~~
更多推荐
所有评论(0)