多表查询(left join)时简单高效去掉重复列的尝试

需求背景：数据库中有主表A,有其他表B、C、D、E、F、G…主表A与其他表使用字段report_id进行关联，现将各个表连接起来做成宽表。基本做法：A表左连接left join其余表。以三个表连接举例，代码仅选择部分字段：SELECT *FROM(SELECT distinctt1.report_id,t1.credit_type,t1.name,t1.certificate_no,t1.crea

锵锵锵锵~蒋

16983人浏览 · 2021-07-27 17:03:30

锵锵锵锵~蒋 · 2021-07-27 17:03:30 发布

需求背景：数据库中有主表A,有其他表B、C、D、E、F、G…主表A与其他表使用字段report_id进行关联，现将各个表连接起来做成宽表。

基本做法：A表左连接left join其余表。

以三个表连接举例，代码仅选择部分字段：

SELECT *
FROM
	(
SELECT distinct 
	t1.report_id,
    t1.credit_type,
    t1.name,
    t1.certificate_no,
	t1.create_time,
    t1.file_path
FROM
	gears.credit_base_info t1
where
    t1.report_id is not null
	) base  # 投影的别名
left JOIN ( 
SELECT distinct 
    t2.report_id,
	t2.pay_status,
    t2.month_fee_amt,
    t2.personal_pay_rate
FROM
	gears.pcredit_house_fund_record t2
where
    t2.info_update_date is not null ) house_found 
    ON base.report_id = house_found.report_id
left JOIN ( 
SELECT distinct
    t3.report_id,
    t3.court_name,
    t3.cause,
    t3.register_date,
    t3.result_type,
    IF(t3.case_status = '列入失信名单', '1','0') dishonest_status,
    t3.case_end_date,
    t3.apply_execution_object,
    t3.executed_object
FROM
	gears.pcredit_force_execution_record t3
    ) force_execution
    ON base.report_id = force_execution.report_id 
where
    base.report_id is not null

因为使用SELECT ，查询出来的表中有三个一模一样的report_id列*，如何去除重复列？

目前未找到可以直接删除重复的列的方法，因此“曲线救国”探索些方法加快写列名的速度。

1. 显示指定select的列名，即只要base.report_id，其他两个投影的report_id不选:

SELECT
	base.report_id,base.credit_type,base.name,base.certificate_no,base.create_time,
	base.file_path,house_found.pay_status,house_found.month_fee_amt,house_found.personal_pay_rate,house_found.company_pay_rate,
	force_execution.register_date,force_execution.result_type,force_execution.dishonest_status,force_execution.case_end_date,force_execution.apply_execution_object,force_execution.executed_object
FROM
.......

当列名特别多时，写出所有列名的过程要耗费很多时间，大大降低写代码的效率，因此尝试将这个写列名过程变简单快速一些。
即想办法快速写出Select表中除了某个字段的所有字段（方法在2中）
然后对于不重复的列名不需指定表名，即：

SELECT
	base.report_id,credit_type,name,certificate_no,create_time,
	file_path,pay_status,month_fee_amt,personal_pay_rate,company_pay_rate,
	register_date,result_type,dishonest_status,case_end_date,apply_execution_object,executed_object
FROM
.......

2. 关于简单写出查询除了某一列的其他列的SQL语句

 SELECT
	CONCAT(' select ',GROUP_CONCAT(COLUMN_NAME),' from ', TABLE_NAME,' ;')
 FROM information_schema.COLUMNS
 WHERE table_name = 'xxxx'
 AND TABLE_SCHEMA = 'ccms_xxx' and COLUMN_NAME != 'id';

执行上面sql得出一条新的sql语句。如图中快速写出了选择除了id这一列的其他列的语句。
在这里插入图片描述
如此一来，我们select一个表的多个字段的代码就快了很多，生成上述语句后，可去掉其他不要的字段，仅对与其他表重复的字段使用<别名.字段>的写法即可。

用此方法左连接的某一个表的代码可快速写出:

left JOIN ( 
select 
	report_id,sex,birthday,marriage_status,mobile_no,work_tel,home_tel,education,jhi_degree,communication_address,
	residence_address,spouse_name,spouse_certificate_type,spouse_certificate_no,spouse_work_unit 
from person_info 
) person #投影的别名

最后代码：

#稍微简单的一版
product_db=pymysql.connect(host='trans.magfin.cn', port=6690,user="pd_query", password="8dytjn3s", database="gears", charset='utf8' )
sql1='''

SELECT
	base.report_id,credit_type,name,certificate_no,create_time,
	file_path,pay_status,month_fee_amt,personal_pay_rate,
	court_name,cause,register_date,result_type,dishonest_status,case_end_date,apply_execution_object,executed_object
FROM
	(
SELECT distinct 
	report_id,credit_type,name,certificate_no,create_time,file_path
FROM
	gears.credit_base_info 
where
   report_id is not null
	) base  # 投影的别名
left JOIN ( 
SELECT distinct 
    report_id,pay_status,month_fee_amt,personal_pay_rate 
FROM
	gears.pcredit_house_fund_record 
where
    info_update_date is not null ) house_found 
    ON base.report_id = house_found.report_id
left JOIN ( 
SELECT distinct
    report_id,court_name,cause,register_date,result_type,
    IF(case_status = '列入失信人名单', '1','0') dishonest_status,
    case_end_date,apply_execution_object,executed_object
FROM
	gears.pcredit_force_execution_record 
    ) force_execution
    ON base.report_id = force_execution.report_id 
where
    base.report_id is not null

'''

credit_model1_df=pd.read_sql(sql1,product_db)

credit_model1_df.shape #(7307, 20)
credit_model1_df.head(20)
credit_model1_df.describe()
credit_model1_df.info()