多表查询(left join)时简单高效去掉重复列的尝试
需求背景:数据库中有主表A,有其他表B、C、D、E、F、G…主表A与其他表使用字段report_id进行关联,现将各个表连接起来做成宽表。基本做法:A表左连接left join其余表。以三个表连接举例,代码仅选择部分字段:SELECT *FROM(SELECT distinctt1.report_id,t1.credit_type,t1.name,t1.certificate_no,t1.crea
需求背景:数据库中有主表A,有其他表B、C、D、E、F、G…主表A与其他表使用字段report_id进行关联,现将各个表连接起来做成宽表。
基本做法:A表左连接left join其余表。
以三个表连接举例,代码仅选择部分字段:
SELECT *
FROM
(
SELECT distinct
t1.report_id,
t1.credit_type,
t1.name,
t1.certificate_no,
t1.create_time,
t1.file_path
FROM
gears.credit_base_info t1
where
t1.report_id is not null
) base # 投影的别名
left JOIN (
SELECT distinct
t2.report_id,
t2.pay_status,
t2.month_fee_amt,
t2.personal_pay_rate
FROM
gears.pcredit_house_fund_record t2
where
t2.info_update_date is not null ) house_found
ON base.report_id = house_found.report_id
left JOIN (
SELECT distinct
t3.report_id,
t3.court_name,
t3.cause,
t3.register_date,
t3.result_type,
IF(t3.case_status = '列入失信名单', '1','0') dishonest_status,
t3.case_end_date,
t3.apply_execution_object,
t3.executed_object
FROM
gears.pcredit_force_execution_record t3
) force_execution
ON base.report_id = force_execution.report_id
where
base.report_id is not null
因为使用SELECT ,查询出来的表中有三个一模一样的report_id列*,如何去除重复列?
目前未找到可以直接删除重复的列的方法,因此“曲线救国”探索些方法加快写列名的速度。
1. 显示指定select的列名,即只要base.report_id,其他两个投影的report_id不选:
SELECT
base.report_id,base.credit_type,base.name,base.certificate_no,base.create_time,
base.file_path,house_found.pay_status,house_found.month_fee_amt,house_found.personal_pay_rate,house_found.company_pay_rate,
force_execution.register_date,force_execution.result_type,force_execution.dishonest_status,force_execution.case_end_date,force_execution.apply_execution_object,force_execution.executed_object
FROM
.......
当列名特别多时,写出所有列名的过程要耗费很多时间,大大降低写代码的效率,因此尝试将这个写列名过程变简单快速一些。
即想办法快速写出Select表中除了某个字段的所有字段(方法在2中)
然后对于不重复的列名不需指定表名,即:
SELECT
base.report_id,credit_type,name,certificate_no,create_time,
file_path,pay_status,month_fee_amt,personal_pay_rate,company_pay_rate,
register_date,result_type,dishonest_status,case_end_date,apply_execution_object,executed_object
FROM
.......
2. 关于简单写出查询除了某一列的其他列的SQL语句
SELECT
CONCAT(' select ',GROUP_CONCAT(COLUMN_NAME),' from ', TABLE_NAME,' ;')
FROM information_schema.COLUMNS
WHERE table_name = 'xxxx'
AND TABLE_SCHEMA = 'ccms_xxx' and COLUMN_NAME != 'id';
执行上面sql得出一条新的sql语句。如图中快速写出了选择除了id这一列的其他列的语句。
如此一来,我们select一个表的多个字段的代码就快了很多,生成上述语句后,可去掉其他不要的字段,仅对与其他表重复的字段使用<别名.字段>的写法即可。
用此方法左连接的某一个表的代码可快速写出:
left JOIN (
select
report_id,sex,birthday,marriage_status,mobile_no,work_tel,home_tel,education,jhi_degree,communication_address,
residence_address,spouse_name,spouse_certificate_type,spouse_certificate_no,spouse_work_unit
from person_info
) person #投影的别名
最后代码:
#稍微简单的一版
product_db=pymysql.connect(host='trans.magfin.cn', port=6690,user="pd_query", password="8dytjn3s", database="gears", charset='utf8' )
sql1='''
SELECT
base.report_id,credit_type,name,certificate_no,create_time,
file_path,pay_status,month_fee_amt,personal_pay_rate,
court_name,cause,register_date,result_type,dishonest_status,case_end_date,apply_execution_object,executed_object
FROM
(
SELECT distinct
report_id,credit_type,name,certificate_no,create_time,file_path
FROM
gears.credit_base_info
where
report_id is not null
) base # 投影的别名
left JOIN (
SELECT distinct
report_id,pay_status,month_fee_amt,personal_pay_rate
FROM
gears.pcredit_house_fund_record
where
info_update_date is not null ) house_found
ON base.report_id = house_found.report_id
left JOIN (
SELECT distinct
report_id,court_name,cause,register_date,result_type,
IF(case_status = '列入失信人名单', '1','0') dishonest_status,
case_end_date,apply_execution_object,executed_object
FROM
gears.pcredit_force_execution_record
) force_execution
ON base.report_id = force_execution.report_id
where
base.report_id is not null
'''
credit_model1_df=pd.read_sql(sql1,product_db)
credit_model1_df.shape #(7307, 20)
credit_model1_df.head(20)
credit_model1_df.describe()
credit_model1_df.info()
更多推荐
所有评论(0)