My（Your）SQL

数据类型

数值

类型	大小	范围（有符号）	范围（无符号）	用途
TINYINT	1 字节	(-128，127)	(0，255)	小整数值
SMALLINT	2 字节	(-32 768，32 767)	(0，65 535)	大整数值
MEDIUMINT	3 字节	(-8 388 608，8 388 607)	(0，16 777 215)	大整数值
INT或INTEGER	4 字节	(-2 147 483 648，2 147 483 647)	(0，4 294 967 295)	大整数值
BIGINT	8 字节	(-9 233 372 036 854 775 808，9 223 372 036 854 775 807)	(0，18 446 744 073 709 551 615)	极大整数值
FLOAT	4 字节	(-3.402 823 466 E+38，-1.175 494 351 E-38)，0，(1.175 494 351 E-38，3.402 823 466 351 E+38)	0，(1.175 494 351 E-38，3.402 823 466 E+38)	单精度浮点数值
DOUBLE	8 字节	(-1.797 693 134 862 315 7 E+308，-2.225 073 858 507 201 4 E-308)，0，(2.225 073 858 507 201 4 E-308，1.797 693 134 862 315 7 E+308)	0，(2.225 073 858 507 201 4 E-308，1.797 693 134 862 315 7 E+308)	双精度浮点数值
DECIMAL	对DECIMAL(M,D) ，如果M>D，为M+2否则为D+2	依赖于M和D的值	依赖于M和D的值	小数值

日期

类型	大小 (字节)	范围	格式	用途
DATE	3	1000-01-01/9999-12-31	YYYY-MM-DD	日期值
TIME	3	‘-838:59:59’/‘838:59:59’	HH:MM:SS	时间值或持续时间
YEAR	1	1901/2155	YYYY	年份值
DATETIME	8	1000-01-01 00:00:00/9999-12-31 23:59:59	YYYY-MM-DD HH:MM:SS	混合日期和时间值
TIMESTAMP	4	1970-01-01 00:00:00/2038 结束时间是第 2147483647 秒，北京时间 2038-1-19 11:14:07，格林尼治时间 2038-1-19 03:14:07	YYYYMMDD HHMMSS	混合日期和时间值，时间戳

字符串

类型	大小	用途
CHAR	0-255字节	定长字符串
VARCHAR	0-65535 字节	变长字符串
TINYBLOB	0-255字节	不超过 255 个字符的二进制字符串
TINYTEXT	0-255字节	短文本字符串
BLOB	0-65 535字节	二进制形式的长文本数据
TEXT	0-65 535字节	长文本数据
MEDIUMBLOB	0-16 777 215字节	二进制形式的中等长度文本数据
MEDIUMTEXT	0-16 777 215字节	中等长度文本数据
LONGBLOB	0-4 294 967 295字节	二进制形式的极大文本数据
LONGTEXT	0-4 294 967 295字节	极大文本数据

CHAR 和 VARCHAR:

CHAR在定义时会限定存储空间大小，比如一开始规定5个字符。后续如果只给定2个字符，CHAR会自动用空格补齐。

CHAR浪费存储空间，但效率高，VARCHAR则反之

使用数据库的基（只因）本语法

数据库、表（增删改查）略

插入：insert into…values
更新：update…set column_name = …
删除(不完全删除）：delete from…[where…]
删除（完全删除）：truncate…

with rollup

WITH ROLLUP 可以实现在分组统计数据基础上再进行相同的统计（SUM,AVG,COUNT…）

coalesce函数

参数说明：如果a==null,则选择b；如果b==null,则选择c；如果a!=null,则选择a；如果a b c 都为null ，则返回为null（没意义）。

MySQL 连接的使用

外连接：

左外连接【left join】两表连接，返回满足连接条件的行，并且返回左表中不满足的条件的行
右外连接【right join】
满外连接【full outer join】

事务—多条数据操作语句构成的集合

A（原子性—要么完成要么撤回，如果出错会回滚到事务开始前的状态
C（一致性—事务开始和结束后，数据库的完整性都没被破坏（前后数据不会矛盾
I（隔离性—允许多个事务并发执行，对数据进行读写修改
D（持久性—事务处理结束后对数据的修改是永久的

存储过程（stored procedure

一组经过预先编译的SQL语句的封装

优：

简化操作，提高重用性
减少网络传输量（因为语句都保存在MySQL服务器上
提高查询安全性，客户端只需要发送查询请求

存储引擎innoDB

索引

用于快速查找数据记录的数据结构

优

减少磁盘io次数，加快查询效率
加快表与表之间的连接
减少分组和排序的时间

劣

创建索引和维护需要时间
会降低表的更新速度（因为每次进行数据操作时，除了更新数据，还要更新索引）

一个简单的索引结构

B+树 = 数据页（叶子节点） + 目录页（内节点）

B+树层数一般不超过4
B+树层数越少，IO次数越少

索引分类

聚簇索引

也叫簇类索引，是一种对磁盘上实际数据重新组织以按指定的一个或多个列的值排序。由于聚簇索引的索引页面指针指向数据页面，所以使用聚簇索引查找数据几乎总是比使用非聚簇索引快。一张表只能有一个聚簇索引，一般由主键构成

索引即数据，数据即索引

叶子节点存储了全部数据列
二级索引（非聚簇索引）

叶子节点存储了【一个主键以外的列+主键值】，因此使用二级索引查找数据时，最后会进行回表操作（再通过主键值进行一次聚簇索引查找
联合索引

叶子节点存储了【多个主键以外的列+主键值】

关于索引的操作

create [索引类型] [索引名称] on [表名] (列名)
drop index [索引名称] on [表名]

SQL题（由易到难）

1. 题目：

现在运营需要查看用户来自于哪些学校，请从用户信息表中取出学校的去重数据。

distinct关键字的使用

2. 题目：

现在运营只需要查看前2个用户明细设备ID数据，请你从用户信息表 user_profile 中取出相应结果。

limit关键字的使用

3. 题目：

现在你需要查看前2个用户明细设备ID数据，并将列名改为 ‘user_infos_example’,，请你从用户信息表取出相应结果。

as关键字的使用

4. 题目：

现在运营想要取出用户信息表中对应的数据，并先按照gpa、年龄降（升）序排序输出，请取出相应数据。

asc和desc的使用

5. 题目：

现在运营想要对用户的年龄分布开展分析，在分析时想要剔除没有获取到年龄的用户，请你取出所有年龄值不为空的用户的设备ID，性别，年龄，学校的信息。

where过滤空值，使用is来判断是否为空

6. 题目：

现在运营想要对每个学校不同性别的用户活跃情况和发帖数量进行分析，请分别计算出每个学校每种性别的用户数、30天内平均活跃天数和平均发帖数量。**

id	device_id	gender	age	university	gpa	active_days_within_30	question_cnt	answer_cnt
1	2138	male	21	北京大学	3.4	7	2	12
2	3214	male		复旦大学	4.0	15	5	25
3	6543	female	20	北京大学	3.2	12	3	30
4	2315	female	23	浙江大学	3.6	5	1	2
5	5432	male	25	山东大学	3.8	20	15	70
6	2131	male	28	山东大学	3.3	15	7	13
7	4321	male	26	复旦大学	3.6	9	6	52

select
  gender,
  university,
  count(device_id) as user_num,
  avg(active_days_within_30) as avg_active_days,
  avg(question_cnt) as avg_question_cnt
from
  user_profile
group by
  gender,
  university

7. 题目：

现在运营想查看每个学校用户的平均发贴和回帖情况，寻找低活跃度学校进行重点运营，请取出平均发贴数低于5的学校或平均回帖数小于20的学校。

   select
       university, avg(question_cnt) as avg_question_cnt, avg(answer_cnt) as avg_answer_cnt
   from
       user_profile
   group by
       university
   having
       avg(question_cnt) < 5 or avg(answer_cnt) < 20

我的错误点：一开始使用了where

university having

使用HAVING
HAVING子句用于对分组后的结果再进行过滤，
它的功能有点像WHERE子句，但它用于组而不是单个记录。
在HAVING子句中可以使用统计函数，但在WHERE子句中则不能。
HAVING通常与GROUP BY子句一起使用。

8. 题目：

运营想要计算一些参加了答题的不同学校、不同难度的用户平均答题量，请你写SQL取出相应数据

用户信息表：user_profile

id	device_id	gender	age	university	gpa	active_days_within_30	question_cnt	answer_cnt
1	2138	male	21	北京大学	3.4	7	2	12
2	3214	male	NULL	复旦大学	4	15	5	25
3	6543	female	20	北京大学	3.2	12	3	30
4	2315	female	23	浙江大学	3.6	5	1	2
5	5432	male	25	山东大学	3.8	20	15	70
6	2131	male	28	山东大学	3.3	15	7	13
7	4321	male	28	复旦大学	3.6	9	6	52

题库练习明细表：question_practice_detail

id	device_id	question_id	result
1	2138	111	wrong
2	3214	112	wrong
3	3214	113	wrong
4	6534	111	right
5	2315	115	right
6	2315	116	right
7	2315	117	wrong
8	5432	117	wrong
9	5432	112	wrong
10	2131	113	right
11	5432	113	wrong
12	2315	115	right
13	2315	116	right
14	2315	117	wrong
15	5432	117	wrong
16	5432	112	wrong
17	2131	113	right
18	5432	113	wrong
19	2315	117	wrong
20	5432	117	wrong
21	5432	112	wrong
22	2131	113	right
23	5432	113	wrong

问题细节表：question_detail

id	question_id	difficult_level
1	111	hard
2	112	medium
3	113	easy
4	115	easy
5	116	medium
6	117	easy

应该返回的结果：

university	difficult_level	avg_answer_cnt
北京大学	hard	1.0000
复旦大学	easy	1.0000
复旦大学	medium	1.0000
山东大学	easy	4.5000
山东大学	medium	3.0000
浙江大学	easy	5.0000
浙江大学	medium	2.0000

select
  up.university,
  qd.difficult_level,
  count(qd.question_id) / count(distinct up.device_id) as avg_answer_cnt
from
  user_profile as up
  inner join question_practice_detail as qpd on up.device_id = qpd.device_id
  inner join question_detail as qd on qpd.question_id = qd.question_id
group by
  up.university, qd.difficult_level

9. 题目：

运营想要查看参加了答题的山东大学的用户在不同难度下的平均答题题目数，请取出相应数据

山东大学的用户
不同难度
平均答题题目数

select 
    "山东大学" as university,
    difficult_level,
    count(qpd.question_id) / count(distinct qpd.device_id) as avg_answer_cnt
from question_practice_detail as qpd

inner join user_profile as up
on up.device_id=qpd.device_id and up.university="山东大学"

inner join question_detail as qd
on qd.question_id=qpd.question_id

10. 题目：

现在运营想要分别查看学校为山东大学或者性别为男性的用户的device_id、gender、age和gpa数据，请取出相应结果，结果不去重。

结果不去重，需要使用到union all

union 和union all的区别
union是合并两个查询语句的结果集，并排除重复项
union all是不排除重复项的，（符合题目要求）

union使用前提

使用union合并两个表时，需要两个表的结果集字段完全一样；
表一（SELECT device_id,gender,age,gpa ）;
表二（SELECT device_id,gender,age,gpa）

11. 题目：

现在运营想要查看用户在某天刷题后第二天还会再来刷题的平均概率。请你取出相应数据。

id	device_id	quest_id	result	date
1	2138	111	wrong	2021-05-03
2	3214	112	wrong	2021-05-09
3	3214	113	wrong	2021-06-15
4	6543	111	right	2021-08-13
5	2315	115	right	2021-08-13
6	2315	116	right	2021-08-14
7	2315	117	wrong	2021-08-15

思路：

原表a，复制一份表b
先去重（有可能一个用户一天刷多次题
表b的时间全部+1（day
- 需要使用的方法——date_add(date,interval 1 day)
如果第二天还会来刷题，那么两个表左外连接的结果就是：连续两天刷题的行数据没有null值出现
某天刷题后第二天还会再来刷题的平均概率：统计非null的行/所有行

select avg(if(b.device_id is not null,1,0)) as avg_ret
from 
(select distinct device_id,date
from question_practice_detail
)a
left join 
(
select distinct device_id,date_add(date,interval 1 day) as date 
from question_practice_detail
)b
on a.device_id = b.device_id and a.date = b.date

12. 题目：

现在运营举办了一场比赛，收到了一些参赛申请，表数据记录形式如下所示，现在运营想要统计每个性别的用户分别有多少参赛者，请取出相应结果

device_id	profile	blog_url
2138	180cm,75kg,27,male	http:/url/bigboy777
3214	165cm,45kg,26,female	http:/url/kittycc
6543	178cm,65kg,25,male	http:/url/tiger
4321	171cm,55kg,23,female	http:/url/uhksd
2131	168cm,45kg,22,female	http:/urlsydney

观察：

很明显profile字段的存储类型是字符串，里面有多条信息。需要用到某些函数来实现字符串的分割

函数：

substring_index(str, delim, count)

str：要处理的字符串
delim：分隔符
count：计数
- 如果count为正数n，就从左往右数，截取第n个分隔符的左边全部内容，n为负数的话，则反之

select
  substring_index(profile, ',', -1) as gender,
  count(*) as number
from user_submit
group by
  gender

13.题目：

现在运营想要找到每个学校gpa最低的同学来做调研，请你取出每个学校的最低gpa。

select
  device_id,
  university,
  min(gpa) as gpa
from
  user_profile
group by
  university
order by
  university

以上是错误的语句，此时得到的结果中，最低的gpa对应的device_id有可能不一致

解决方法：

先根据原表（表a）拿到每个学校最低的gpa（表b
ab两表内连接

select
  a.device_id,
  b.university,
  b.gpa
from
  user_profile as a
  inner join (
    select
      university,
      min(gpa) as gpa
    from
      user_profile
    GROUP BY
      university
  ) as b
  on a.university = b.university and a.gpa = b.gpa
order by
a.university

14.题目：

现在运营想要了解复旦大学的每个用户在8月份练习的总题目数和回答正确的题目数情况，请取出相应明细数据，对于在8月份没有练习过的用户，答题数结果返回0.

分析：

学校限制：复旦大学—‘复旦大学’ as university
八月份的练习—month(b.date) = 8
8月份没有练习过的用户，答题数结果返回0—左连接
8月份练习的总题目数—count(b.question_id) as question_cnt
回答正确的题目数—sum(if(b.result = ‘right’, 1, 0)) as right_question_cnt

select
  a.device_id,
  '复旦大学' as university,
  count(b.question_id) as question_cnt,
  sum(if(b.result = 'right', 1, 0)) as right_question_cnt
from
  user_profile as a
  left join question_practice_detail as b on a.device_id = b.device_id
  and month(b.date) = 8
where
  a.university = '复旦大学'
group by
  a.device_id

15.题目：

现在运营想要了解浙江大学的用户在不同难度题目下答题的正确率情况，请取出相应数据，并按照准确率升序输出。

分析：

答题正确率
不同难度下的题目
浙江大学
准确率升序输出

select
  qd.difficult_level,
  sum(if(qpd.result = 'right', 1, 0)) / count(qpd.question_id) as correct_rate
from
  question_practice_detail as qpd
  inner join question_detail as qd on qpd.question_id = qd.question_id
where
  qpd.device_id in (
    select
      up.device_id
    from
      user_profile as up
    where
      up.university = '浙江大学'
  )
group by
  qd.difficult_level
order by
  correct_rate

16.题目：

现有一张试卷作答记录表exam_record，其中包含多年来的用户作答试卷记录，结构如下表：

作答记录表exam_record：

start_time是试卷开始时间

submit_time 是交卷，即结束时间

Filed	Type	Null	Key	Extra	Default	Comment
id	int(11)	NO	PRI	auto_increment	(NULL)	自增ID
uid	int(11)	NO			(NULL)	用户ID
exam_id	int(11)	NO			(NULL)	试卷ID
start_time	datetime	NO			(NULL)	开始时间
submit_time	datetime	YES			(NULL)	提交时间
score	tinyint(4)	YES			(NULL)	得分

请删除exam_record表中作答时间小于5分钟整且分数不及格（及格线为60分）的记录

分析：

计算时间差，需要用到的函数

TIMESTAMPDIFF(interval, time_start, time_end)可计算time_start-time_end的时间差，单位以指定的interval为准，常用可选：
- SECOND 秒
- MINUTE 分钟（返回秒数差除以60的整数部分）
- HOUR 小时（返回秒数差除以3600的整数部分）
- DAY 天数（返回秒数差除以3600*24的整数部分）
- MONTH 月数
- YEAR 年数

delete from
  exam_record
where
  timestampdiff(minute, start_time, submit_time) < 5 and score < 60

17.题目：

牛客的运营同学想要查看大家在SQL类别中高难度试卷的得分情况。

请你帮她从exam_record数据表中计算所有用户完成SQL类别高难度试卷得分的截断平均值（去掉一个最大值和一个最小值后的平均值）。

示例数据：examination_info（exam_id试卷ID, tag试卷类别, difficulty试卷难度, duration考试时长, release_time发布时间）

id	exam_id	tag	difficulty	duration	release_time
1	9001	SQL	hard	60	2020-01-01 10:00:00
2	9002	算法	medium	80	2020-08-02 10:00:00

id	uid	exam_id	start_time	submit_time	score
1	1001	9001	2020-01-02 09:01:01	2020-01-02 09:21:01	80
2	1001	9001	2021-05-02 10:01:01	2021-05-02 10:30:01	81
3	1001	9001	2021-06-02 19:01:01	2021-06-02 19:31:01	84
4	1001	9002	2021-09-05 19:01:01	2021-09-05 19:40:01	89
5	1001	9001	2021-09-02 12:01:01	(NULL)	(NULL)
6	1001	9002	2021-09-01 12:01:01	(NULL)	(NULL)
7	1002	9002	2021-02-02 19:01:01	2021-02-02 19:30:01	87
8	1002	9001	2021-05-05 18:01:01	2021-05-05 18:59:02	90
9	1003	9001	2021-09-07 12:01:01	2021-09-07 10:31:01	50
10	1004	9001	2021-09-06 10:01:01	(NULL)	(NULL)

分析：

SQL类别—where
高难度—where
截断平均值— round((sum(score) - max(score) - min(score)) / (count(score) - 2),1)

select
  'SQL' as tag,
  ei.difficulty,
  round(
    (sum(score) - max(score) - min(score)) / (count(score) - 2),
    1
  ) as clip_avg_score
from
  examination_info as ei
  inner join exam_record as er on ei.exam_id = er.exam_id
where
  ei.difficulty = 'hard'
  and ei.tag = 'SQL'
group by
  er.exam_id

18.题目：

请从试卷作答记录表中找到SQL试卷得分不小于该类试卷平均得分的用户最低得分。

示例数据 exam_record表（uid用户ID, exam_id试卷ID, start_time开始作答时间, submit_time交卷时间, score得分）：

id	uid	exam_id	start_time	submit_time	score
1	1001	9001	2020-01-02 09:01:01	2020-01-02 09:21:01	80
2	1002	9001	2021-09-05 19:01:01	2021-09-05 19:40:01	89
3	1002	9002	2021-09-02 12:01:01	(NULL)	(NULL)
4	1002	9003	2021-09-01 12:01:01	(NULL)	(NULL)
5	1002	9001	2021-02-02 19:01:01	2021-02-02 19:30:01	87
6	1002	9002	2021-05-05 18:01:01	2021-05-05 18:59:02	90
7	1003	9002	2021-02-06 12:01:01	(NULL)	(NULL)
8	1003	9003	2021-09-07 10:01:01	2021-09-07 10:31:01	86
9	1004	9003	2021-09-06 12:01:01	(NULL)	(NULL)

id	exam_id	tag	difficulty	duration	release_time
1	9001	SQL	hard	60	2020-01-01 10:00:00
2	9002	SQL	easy	60	2020-02-01 10:00:00
3	9003	算法	medium	80	2020-08-02 10:00:00

select
  er.score as min_score_over_avg
from
  exam_record as er
where
  er.exam_id in (
    select
      ei.exam_id
    from
      examination_info as ei
    where
      tag = 'SQL'
  )
  and er.score >= (
    select
      avg(er.score)
    from
      exam_record as er
    where
      er.exam_id in (
        select
          ei.exam_id
        from
          examination_info as ei
        where
          tag = 'SQL'
      )
  )
order by
  er.score
limit 1

19.题目：

用户在牛客试卷作答区作答记录存储在表exam_record中，内容如下：

exam_record表（uid用户ID, exam_id试卷ID, start_time开始作答时间, submit_time交卷时间, score得分）

id	uid	exam_id	start_time	submit_time	score
1	1001	9001	2021-07-02 09:01:01	2021-07-02 09:21:01	80
2	1002	9001	2021-09-05 19:01:01	2021-09-05 19:40:01	81
3	1002	9002	2021-09-02 12:01:01	(NULL)	(NULL)
4	1002	9003	2021-09-01 12:01:01	(NULL)	(NULL)
5	1002	9001	2021-07-02 19:01:01	2021-07-02 19:30:01	82
6	1002	9002	2021-07-05 18:01:01	2021-07-05 18:59:02	90

请计算2021年每个月里试卷作答区用户平均月活跃天数avg_active_days和月度活跃人数mau，上面数据的示例输出如下：

month	avg_active_days	mau
202107	1.50	2
202109	1.25	4

解释：2021年7月有2人活跃，共活跃了3天（1001活跃1天，1002活跃2天），平均活跃天数1.5；2021年9月有4人活跃，共活跃了5天，平均活跃天数1.25，结果保留2位小数。

注：此处活跃指有交卷行为。

分析：

2021年每个月
用户平均月活跃天数—round((count(distinct uid, date_format(submit_time, ‘%y%m%d’))) / count(distinct uid),2)
- 思路：
  
  用户平均月活跃天数 = 该月活跃总天数 / 月度活跃总人数
  
  该月活跃总天数统计时，不能统计一天内有重复的uid（用户）且要求submit_time不能为null
  
  因此要distinct去重
月度活跃人数—count(distinct er.uid)

select
  date_format(submit_time, '%Y%m') as month,
  round(
    (
      count(distinct uid, date_format(submit_time, '%y%m%d'))
    ) / count(distinct uid),
    2
  ) as avg_active_days,
  count(distinct er.uid) as mau
from
  exam_record as er
where
  submit_time is not null
  and year(er.submit_time) = 2021
group by
  date_format(submit_time, '%Y%m')

注意：

round()函数的使用

date_format函数的使用

count函数的使用

count(distinct col) 计算该列除 NULL 之外的不重复行数,注意 count(distinct col1, col2) 如果其中一列全为 NULL,那么即使另一列有不同的值,也返回为 0

20.题目：

现有一张题目练习记录表practice_record，示例内容如下：

id	uid	question_id	submit_time	score
1	1001	8001	2021-08-02 11:41:01	60
2	1002	8001	2021-09-02 19:30:01	50
3	1002	8001	2021-09-02 19:20:01	70
4	1002	8002	2021-09-02 19:38:01	70
5	1003	8002	2021-08-01 19:38:01	80

请从中统计出2021年每个月里用户的月总刷题数month_q_cnt 和日均刷题数avg_day_q_cnt（按月份升序排序）以及该年的总体情况，示例数据输出如下：

submit_month	month_q_cnt	avg_day_q_cnt
202108	2	0.065
202109	3	0.100
2021汇总	5	0.161

解释：2021年8月共有2次刷题记录，日均刷题数为2/31=0.065（保留3位小数）；2021年9月共有3次刷题记录，日均刷题数为3/30=0.100；2021年共有5次刷题记录（年度汇总平均无实际意义，这里我们按照31天来算5/31=0.161）

参考答案

select
  coalesce(year_mon, '2021汇总') as submit_month,
  count(question_id) as month_q_cnt,
  round(count(question_id) / max(t.days_month), 3) as avg_day_cnt
from
  (
    select
      date_format(submit_time, '%Y%m') as year_mon
      question_id,
      dayofmonth(last_day(submit_time)) as days_month,
    from
      practice_record
    where
      year(submit_time) = 2021
  ) as t
group by
  t.year_mon with rollup;

dayofmonth函数的使用

21. 题目：

现有试卷作答记录表exam_record（uid用户ID, exam_id试卷ID, start_time开始作答时间, submit_time交卷时间, score得分），示例数据如下：

id	uid	exam_id	start_time	submit_time	score
1	1001	9001	2021-07-02 09:01:01	2021-07-02 09:21:01	80
2	1002	9001	2021-09-05 19:01:01	2021-09-05 19:40:01	81
3	1002	9002	2021-09-02 12:01:01	(NULL)	(NULL)
4	1002	9003	2021-09-01 12:01:01	(NULL)	(NULL)
5	1002	9001	2021-07-02 19:01:01	2021-07-02 19:30:01	82
6	1002	9002	2021-07-05 18:01:01	2021-07-05 18:59:02	90
7	1003	9002	2021-07-06 12:01:01	(NULL)	(NULL)
8	1003	9003	2021-09-07 10:01:01	2021-09-07 10:31:01	86
9	1004	9003	2021-09-06 12:01:01	(NULL)	(NULL)
10	1002	9003	2021-09-01 12:01:01	2021-09-01 12:31:01	81
11	1005	9001	2021-09-01 12:01:01	2021-09-01 12:31:01	88
12	1005	9002	2021-09-01 12:01:01	2021-09-01 12:31:01	88
13	1006	9002	2021-09-02 12:11:01	2021-09-02 12:31:01	89

还有一张试卷信息表examination_info（exam_id试卷ID, tag试卷类别, difficulty试卷难度, duration考试时长, release_time发布时间），示例数据如下：

id	exam_id	tag	difficulty	duration	release_time
1	9001	SQL	hard	60	2020-01-01 10:00:00
2	9002	SQL	easy	60	2020-02-01 10:00:00
3	9003	算法	medium	80	2020-08-02 10:00:00

请统计2021年每个未完成试卷作答数大于1的有效用户的数据（有效用户指完成试卷作答数至少为1且未完成数小于5），输出用户ID、未完成试卷作答数、完成试卷作答数、作答过的试卷tag集合，按未完成试卷数量由多到少排序。示例数据的输出结果如下：

uid	incomplete_cnt	complete_cnt	detail
1002	2	4	2021-09-01:算法;2021-07-02:SQL;2021-09-02:SQL;2021-09-05:SQL;2021-07-05:SQL

解释：2021年的作答记录中，除了1004，其他用户均满足有效用户定义，但只有1002未完成试卷数大于1，因此只输出1002，detail中是1002作答过的试卷{日期:tag}集合，日期和tag间用:连接，多元素间用;连接。

分析：

未完成试卷作答数大于1—incomplete_cnt > 1
有效用户指完成试卷作答数至少为1且未完成数小于5—complete_cnt >= 1—incomplete_cnt < 5
作答过的试卷tag集合—group_concat(distinct concat_ws(‘:’, date(start_time), tag) SEPARATOR ‘;’)
按未完成试卷数量由多到少排序—order by incomplete_cnt DESC

select
  uid,
  count(incomplete) as incomplete_cnt,
  count(complete) as complete_cnt,
  group_concat(
    distinct concat_ws(':', date(start_time), tag) SEPARATOR ';'
  ) as detail
from
  (
    select
      uid,
      tag,
      start_time,
      if(submit_time is null, 1, null) as incomplete,
      if(submit_time is null, null, 1) as complete
    from
      exam_record
      left join examination_info using(exam_id)
    where
      year(start_time) = 2021
  ) as exam_complete_rec
group by
  uid
having
  complete_cnt >= 1
  and incomplete_cnt between 2 and 4
order by
  incomplete_cnt desc

注意：

group_concat（xxx）函数的使用

是将分组中括号里对应的字符串进行连接.如果分组中括号里的参数xxx有多行，那么就会将这多行的字符串连接，每个字符串之间会有特定的符号进行分隔。

22.题目：

现有试卷作答记录表exam_record（uid：用户ID, exam_id：试卷ID, start_time：开始作答时间, submit_time：交卷时间，没提交的话为NULL, score：得分），示例数据如下：

id	uid	exam_id	start_time	submit_time	score
1	1001	9001	2021-07-02 09:01:01	(NULL)	(NULL)
2	1002	9003	2021-09-01 12:01:01	2021-09-01 12:21:01	60
3	1002	9002	2021-09-02 12:01:01	2021-09-02 12:31:01	70
4	1002	9001	2021-09-05 19:01:01	2021-09-05 19:40:01	81
5	1002	9002	2021-07-06 12:01:01	(NULL)	(NULL)
6	1003	9003	2021-09-07 10:01:01	2021-09-07 10:31:01	86
7	1003	9003	2021-09-08 12:01:01	2021-09-08 12:11:01	40
8	1003	9001	2021-09-08 13:01:01	(NULL)	(NULL)
9	1003	9002	2021-09-08 14:01:01	(NULL)	(NULL)
10	1003	9003	2021-09-08 15:01:01	(NULL)	(NULL)
11	1005	9001	2021-09-01 12:01:01	2021-09-01 12:31:01	88
12	1005	9002	2021-09-01 12:01:01	2021-09-01 12:31:01	88
13	1005	9002	2021-09-02 12:11:01	2021-09-02 12:31:01	89

试卷信息表examination_info（exam_id：试卷ID, tag：试卷类别, difficulty：试卷难度, duration：考试时长, release_time：发布时间），示例数据如下：

id	exam_id	tag	difficulty	duration	release_time
1	9001	SQL	hard	60	2020-01-01 10:00:00
2	9002	C++	easy	60	2020-02-01 10:00:00
3	9003	算法	medium	80	2020-08-02 10:00:00

请从表中统计出 “当月均完成试卷数”不小于3的用户们爱作答的类别及作答次数，按次数降序输出，示例输出如下：

tag	tag_cnt
C++	4
SQL	2
算法	1

解释：用户1002和1005在2021年09月的完成试卷数目均为3，其他用户均小于3；然后用户1002和1005作答过的试卷tag分布结果按作答次数降序排序依次为C++、SQL、算法。

分析：

“当月均完成试卷数”不小于3—count(exam_id) / count(distinct DATE_FORMAT(submit_time, “%Y%m”)) >= 3

select
  tag,
  count(tag) as tag_cnt
from
  exam_record
  join examination_info using(exam_id)
where
  uid in (
    select
      uid
    from
      exam_record
    where
      submit_time is not null
    group by
      uid
    having
      count(exam_id) / count(distinct DATE_FORMAT(start_time, "%Y%m")) >= 3
  )
group by
  tag
order by
  tag_cnt desc

23.题目：

现有用户信息表user_info（uid用户ID，nick_name昵称, achievement成就值, level等级, job职业方向, register_time注册时间），示例数据如下：

id	uid	nick_name	achievement	level	job	register_time
1	1001	牛客1号	3100	7	算法	2020-01-01 10:00:00
2	1002	牛客2号	2100	6	算法	2020-01-01 10:00:00
3	1003	牛客3号	1500	5	算法	2020-01-01 10:00:00
4	1004	牛客4号	1100	4	算法	2020-01-01 10:00:00
5	1005	牛客5号	1600	6	C++	2020-01-01 10:00:00
6	1006	牛客6号	3000	6	C++	2020-01-01 10:00:00

释义：用户1001昵称为牛客1号，成就值为3100，用户等级是7级，职业方向为算法，注册时间2020-01-01 10:00:00

试卷信息表examination_info（exam_id试卷ID, tag试卷类别, difficulty试卷难度, duration考试时长, release_time发布时间）示例数据如下：

id	exam_id	tag	difficulty	duration	release_time
1	9001	SQL	hard	60	2021-09-01 06:00:00
2	9002	C++	easy	60	2020-02-01 10:00:00
3	9003	算法	medium	80	2020-08-02 10:00:00

试卷作答记录表exam_record（uid用户ID, exam_id试卷ID, start_time开始作答时间, submit_time交卷时间, score得分）示例数据如下：

id	uid	exam_id	start_time	submit_time	score
1	1001	9001	2021-09-01 09:01:01	2021-09-01 09:41:01	70
2	1002	9003	2021-09-01 12:01:01	2021-09-01 12:21:01	60
3	1002	9002	2021-09-02 12:01:01	2021-09-02 12:31:01	70
4	1002	9001	2021-09-01 19:01:01	2021-09-01 19:40:01	80
5	1002	9003	2021-08-01 12:01:01	2021-08-01 12:21:01	60
6	1002	9002	2021-08-02 12:01:01	2021-08-02 12:31:01	70
7	1002	9001	2021-09-01 19:01:01	2021-09-01 19:40:01	85
8	1002	9002	2021-07-06 12:01:01	(NULL)	(NULL)
9	1003	9002	2021-09-07 10:01:01	2021-09-07 10:31:01	86
10	1003	9003	2021-09-08 12:01:01	2021-09-08 12:11:01	40
11	1003	9003	2021-09-01 13:01:01	2021-09-01 13:41:01	70
12	1003	9001	2021-09-08 14:01:01	(NULL)	(NULL)
13	1003	9002	2021-09-08 15:01:01	(NULL)	(NULL)
14	1005	9001	2021-09-01 12:01:01	2021-09-01 12:31:01	90
15	1005	9002	2021-09-01 12:01:01	2021-09-01 12:31:01	88
16	1005	9002	2021-09-02 12:11:01	2021-09-02 12:31:01	89

请计算每张类别试卷发布后，当天5级以上的用户作答的人数uv和平均分avg_score，按人数降序，相同人数的按平均分升序，示例数据结果输出如下：

exam_id	uv	avg_score
9001	3	81.3

解释：只有一张SQL类别的试卷，试卷ID为9001，发布当天（2021-09-01）有1001、1002、1003、1005作答过，但是1003是5级用户，其他3位为5级以上，他们三的得分有[70,80,85,90]，平均分为81.3（保留1位小数）。

分析：

当天5级以上的用户作答的人数
平均分avg_score—round(avg(temp_table.score), 1) as avg_score
按人数降序，按平均分升序—order by uv desc, avg_score asc

select
  temp_table.exam_id as exam_id,
  count(distinct temp_table.uid) as uv,
  round(avg(temp_table.score), 1) as avg_score
from
  (
    select
      uid,
      exam_id,
      score
    from
      exam_record as er
      left join examination_info as ei using(exam_id)
    where
      er.submit_time is not null
      and date_format(ei.release_time, '%Y%m') = date_format(er.submit_time, '%Y%m')
  ) as temp_table
where
  temp_table.uid in (
    select
      ui.uid
    from
      user_info as ui
    where
      ui.level > 5
  )
group by
  temp_table.exam_id
order by
  uv desc,
  avg_score asc;

思考一下：

确定主表（数据的主要获取源
确定辅助表（可能用于子查询
描述中出现“每”，带有each意味的词，一般要使用group by