Oracle中数据迁移到hive平台

Oracle中数据迁移到hive平台序言：数据从Oracle平台导出为csv格式，上传至hdfs上，在hive上load至临时表，最后insert到目标表，完成数据Oracle中数据迁移到hive平台。（表名及字段等信息属于公司敏感信息，用别的信息代替展示）1、将数据从oracle导出为csv文件格式oracle导出数据采用sqluldr 工具sqluldr user/pasword query

章鱼哥TuNan&Z

3626人浏览 · 2021-09-01 23:16:33

章鱼哥TuNan&Z · 2021-09-01 23:16:33 发布

Oracle中数据迁移到hive平台

序言：数据从Oracle平台导出为csv格式，上传至hdfs上，在hive上load至临时表，最后insert到目标表，完成数据Oracle中数据迁移到hive平台。（表名及字段等信息属于公司敏感信息，用别的信息代替展示）

1、将数据从oracle导出为csv文件格式

oracle导出数据采用sqluldr 工具

sqluldr user/pasword query='select * from student  where age = 10' file='/home/db01/tunan/student.csv' field=',' charset='ZHS16GBK' safe=yes
-- user:用户名
-- pasword：密码
-- query：sql语句（要导出那张表的哪些字段、过滤条件等信息）
-- file:指定路径及文件名称
-- field：指定字段信息的分割符
-- charset：指定字符集

当命令行显示output file /home/db01/tunan/student.csv closed at 100000000rows,size 20000MB.证明导出数据成功。

导出之前请检查空间是否够用。df (这里已采坑空间不足也不报错，知道存储下位置此时数据是不完整的。注意呀~)

当然导入少量数据的话也可以使用PL/SQL Developer 的Export Query Results… （这种速度是真的慢！）

2、将csv文件上传至hdfs

csv文件小于4G可以将数据从linux sz到本地，拖动本地文件直接上传至 hdfs上

也可将数据通 -put 到hdfs上

hdfs dfs -put -f student.csv /user/hive/..
-- /user/hive/.. :hdfs上的路径

3、在hive中创建临时表

CREATE TABLE 'tmp.student_tmp'(
'id' sting comment '主键',
    ...
)
COMMENT '学生表'
row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde' -- 剔除引号
WITH SERDEPROPERTIES(
'field.delim'='\t',
'serialization.format'='\t'
)
stored as textfile -- 存格式为textfile 也可以指定为orc 格式

alter table tmp.student_tmp set tablproperties('skip.header.line.count'='1') --剔除首行（检查自己导出的csv文件是否有首行字段名，如果没有就不要加这一行，否则结果会少一行数据）

在hive中执行以上语句；

4、将数据load到临时表中

load data inpath '/user/hive/....../student.csv' overwrite into table tmp.student_tmp

load结束检查数据是否load成功。打开PL/SQL Developer 执行SQL语句对照字段名对照数据，总条数等

select * from student -- 在PL/SQL Developer 执行
select * from tmp.student_tmp -- 在hive上执行
-- 对照字段对应情况，查看数据已经插入
select count(*) from student --在PL/SQL Developer 执行
select count(*) from tmp.student_tmp -- 在hive上执行
-- 对照两个平台上的数据总数是否一致，如果hive上少一条，检查首行是否重复剔除

5、在hive平台中创建目标表

create table 'ods.student'( --临时表对应，如果目标表为分区表指定分区字段
'id' sting comment '主键',
    ...
)
COMMENT '学生表'
-- PARTITIONED BY ('age' string) -- 指定分区字段
row format serde 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
stored as textfile -- 存格式为textfile 也可以指定为orc 格式
LOCATION 
'hdfs://ns1/user/hive/warehouse/ods.db/student'

执行该shell

6、insert into 目标表 select 字段 from 临时表验证

--开启动态分区
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.max.dynamic.partitions=100000;
SET hive.exec.max.dynamic.partitions.pernode=100000;

-- 将临时表的中数据插入到目标表中
insert into table ods.student  partition(age) -- 如果目标表是分区表，加上开区动态分区字段，加上partition（分区字段）
select 
id ,
....
from tmp.student_tmp

再次验证数据准确性

select * from student -- 在PL/SQL Developer 执行
select * from ods.student -- 在hive上执行
-- 对照字段对应情况，查看数据已经插入
select count(*) from student --在PL/SQL Developer 执行
select count(*) from ods.student -- 在hive上执行
-- 对照两个平台上的数据总数是否一致