【Sqoop】Sqoop的使用(连接MySQL、Hive)
Sqoop的使用使用sqoop前需要启动Hadoop,检查Hadoop的运行状态,我们这里使用的是伪分布式[root@hr conf]# jps1538 DataNode2114 NodeManager1689 SecondaryNameNode1834 ResourceManager1421 NameNode2831 JpsSqoop连接MySQL需要MySQL的jar包,吧jar包放置Sqoo
Sqoop的使用
使用sqoop
前需要启动Hadoop
,检查Hadoop
的运行状态,我们这里使用的是伪分布式
[root@hr conf]# jps
1538 DataNode
2114 NodeManager
1689 SecondaryNameNode
1834 ResourceManager
1421 NameNode
2831 Jps
Sqoop
连接MySQL
需要MySQL
的jar
包,吧jar
包放置Sqoop
的lib
目录下
先测试Sqoop
能否连接MySQL
查看存在的数据库
sqoop list-databases --connect jdbc:mysql://192.168.0.161:3306/ --username root --password 123456
结果中的警告是因为没有配置Hbase
,忽略即可
Warning: /home/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
22/03/16 04:27:14 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
22/03/16 04:27:14 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
22/03/16 04:27:14 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
Wed Mar 16 04:27:14 EDT 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
information_schema
hive
mysql
performance_schema
sys
为了连接HIve
我们需要将Hive
组件lib
目录下的hive-common-3.1.2.jar
放到Sqoop
组件的lib
目录下
cp /home/hive/lib/hive-common-3.1.2.jar /home/sqoop/lib/
在MySQL中创建一个数据库和数据表,并随机插入一些数据
创建数据库和表
create database sample;
create table student(number char(9) primary key, name varchar(10));
添加数据
insert into student values('01','zhangsan');
insert into student values('02','lisi');
insert into student values('03','wangwu');
查看数据
mysql> select * from student;
+--------+----------+
| number | name |
+--------+----------+
| 01 | zhangsan |
| 02 | lisi |
| 03 | wangwu |
+--------+----------+
3 rows in set (0.00 sec)
mysql>
在Hive
中创建数据库和数据表
hive> create database sample;
OK
Time taken: 0.566 seconds
hive> use sample;
OK
Time taken: 0.015 seconds
hive> create table student(number STRING, name STRING) row format delimited fields terminated by ',';
Time taken: 0.499 seconds
从 MySQL
导出数据,导入Hive
。
sqoop
参数说明
参数 | 意义 |
---|---|
–connect | MySQL 数据库连接 URL。 |
–username 和–password | MySQL 数据库的用户名和密码。 |
–table | 导出的数据表名。 |
–fields-terminated-by | Hive 中字段分隔符。 |
–delete-target-dir | 删除导出目的目录。 |
–num-mappers | Hadoop 执行 Sqoop 导入导出启动的 map 任务数。 |
–hive-import --hive-database | 导出到 Hive 的数据库名。 |
–hive-table | 导出到 Hive 的表名。 |
sqoop import --connect jdbc:mysql://192.168.0.161:3306/sample --username root --password 123456 --table student --fields-terminated-by ',' --delete-target-dir --num-mappers 1 --hive-import --hive-database sample --hive-table student
运行日志
Warning: /home/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
22/03/16 05:03:24 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
22/03/16 05:03:24 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
22/03/16 05:03:24 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
22/03/16 05:03:24 INFO tool.CodeGenTool: Beginning code generation
Wed Mar 16 05:03:24 EDT 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
22/03/16 05:03:25 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
22/03/16 05:03:25 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
22/03/16 05:03:25 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop
Note: /tmp/sqoop-root/compile/05aad1252e73aa2f62b414a7722fe98d/student.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
22/03/16 05:03:27 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/05aad1252e73aa2f62b414a7722fe98d/student.jar
22/03/16 05:03:28 INFO tool.ImportTool: Destination directory student is not present, hence not deleting.
22/03/16 05:03:28 WARN manager.MySQLManager: It looks like you are importing from mysql.
22/03/16 05:03:28 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
22/03/16 05:03:28 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
22/03/16 05:03:28 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
22/03/16 05:03:28 INFO mapreduce.ImportJobBase: Beginning import of student
22/03/16 05:03:28 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
22/03/16 05:03:28 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
22/03/16 05:03:28 INFO client.RMProxy: Connecting to ResourceManager at hr/192.168.0.161:8032
Wed Mar 16 05:03:30 EDT 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
22/03/16 05:03:30 INFO db.DBInputFormat: Using read commited transaction isolation
22/03/16 05:03:30 INFO mapreduce.JobSubmitter: number of splits:1
22/03/16 05:03:30 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1647416961901_0001
22/03/16 05:03:31 INFO impl.YarnClientImpl: Submitted application application_1647416961901_0001
22/03/16 05:03:31 INFO mapreduce.Job: The url to track the job: http://hr:8088/proxy/application_1647416961901_0001/
22/03/16 05:03:31 INFO mapreduce.Job: Running job: job_1647416961901_0001
22/03/16 05:03:40 INFO mapreduce.Job: Job job_1647416961901_0001 running in uber mode : false
22/03/16 05:03:40 INFO mapreduce.Job: map 0% reduce 0%
22/03/16 05:03:47 INFO mapreduce.Job: map 100% reduce 0%
22/03/16 05:03:48 INFO mapreduce.Job: Job job_1647416961901_0001 completed successfully
22/03/16 05:03:48 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=133750
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=87
HDFS: Number of bytes written=30
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=3708
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=3708
Total vcore-seconds taken by all map tasks=3708
Total megabyte-seconds taken by all map tasks=3796992
Map-Reduce Framework
Map input records=3
Map output records=3
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=68
CPU time spent (ms)=1020
Physical memory (bytes) snapshot=110944256
Virtual memory (bytes) snapshot=2081656832
Total committed heap usage (bytes)=30474240
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=30
22/03/16 05:03:48 INFO mapreduce.ImportJobBase: Transferred 30 bytes in 20.1651 seconds (1.4877 bytes/sec)
22/03/16 05:03:48 INFO mapreduce.ImportJobBase: Retrieved 3 records.
22/03/16 05:03:48 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table student
Wed Mar 16 05:03:48 EDT 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
22/03/16 05:03:48 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
22/03/16 05:03:48 INFO hive.HiveImport: Loading uploaded data into Hive
22/03/16 05:03:48 INFO conf.HiveConf: Found configuration file file:/home/hive/conf/hive-site.xml
22/03/16 05:03:49 INFO hive.HiveImport: which: no hbase in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/home/jdk/bin:/home/jdk/bin:/home/sqoop/bin)
22/03/16 05:03:51 INFO hive.HiveImport: SLF4J: Class path contains multiple SLF4J bindings.
22/03/16 05:03:51 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/home/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
22/03/16 05:03:51 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/home/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
22/03/16 05:03:51 INFO hive.HiveImport: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
22/03/16 05:03:51 INFO hive.HiveImport: SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
22/03/16 05:03:52 INFO hive.HiveImport: Hive Session ID = 4b655330-3a4e-4ed4-b40f-599c1bff9652
22/03/16 05:03:52 INFO hive.HiveImport:
22/03/16 05:03:52 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/home/hive/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true
22/03/16 05:03:59 INFO hive.HiveImport: Hive Session ID = 57f33b0b-6433-4c3e-8277-4f020467d73b
22/03/16 05:04:00 INFO hive.HiveImport: OK
22/03/16 05:04:00 INFO hive.HiveImport: Time taken: 1.046 seconds
22/03/16 05:04:00 INFO hive.HiveImport: Loading data to table sample.student
22/03/16 05:04:01 INFO hive.HiveImport: OK
22/03/16 05:04:01 INFO hive.HiveImport: Time taken: 0.805 seconds
22/03/16 05:04:02 INFO hive.HiveImport: Hive import complete.
22/03/16 05:04:02 INFO hive.HiveImport: Export directory is contains the _SUCCESS file only, removing the directory.
去查看HIve
上的表
hive> select * from sample.student;
OK
01 zhangsan
02 lisi
03 wangwu
Time taken: 2.154 seconds, Fetched: 3 row(s)
从 Hive
导出数据,导入到 MySQL
。
参数 | 意义 |
---|---|
–connect | MySQL 数据库连接 URL。 |
–username 和–password | MySQL 数据库的用户名和密码。 |
–table | 导出的数据表名。 |
–fields-terminated-by | Hive 中字段分隔符。 |
–export-dir | Hive 数据表在 HDFS 中的存储路径。 |
sqoop export --connect "jdbc:mysql://192.168.0.161:3306/sample?useUnicode=true&characterEncoding=utf-8" --username root --password 123456 --table student --input-fields-terminated-by ',' --export-dir /user/hive/warehouse/sample.db/student/*
Sqoop
其他的命令
列出 MySQL
数据库中的所有数据库。
sqoop list-databases -connect jdbc:mysql://localhost:3306/ -username root -password 123456
连接MySQL
并列出 sample
数据库中的表。
sqoop list-tables -connect jdbc:mysql://localhost:3306/sample -username root -password 123456
将关系型数据的表结构复制到hive
中,只是复制表的结构,表中的内容没有复制过去。
sqoop create-hive-table -connect jdbc:mysql://localhost:3306/sample -table student -username root -password 123456 -hive-table test
其中–table student
为 MySQL
中的数据库 sample
中的表–hive-table test
为 hive
中新建的表名称。
从关系数据库导入文件到Hive
中。
sqoop import --connect jdbc:mysql://localhost:3306/sample --username root --password 123456 --table student --delete-target-dir --num-mappers 1 --hive-import --hive-database default --hive-table test
将 Hive
中的表数据导入到 MySQL
中,在进行导入之前,MySQL
中的表hive_test
表必须已经提前创建好。
sqoop export -connect jdbc:mysql://localhost:3306/sample -username root -password 123456 -table student --input-fields-terminatedby '\001' -export-dir /user/hive/warehouse/test
从数据库导出表的数据到 HDFS
上文件。
sqoop import -connect jdbc:mysql://localhost:3306/sample -username root -password 123456 -table student --num-mappers 1 -targetdir /user/test
从数据库增量导入表数据到 HDFS
中。
sqoop import -connect jdbc:mysql://localhost:3306/sample -username root -password 123456 -table student --num-mappers 1 -targetdir /user/test -check-column number -incremental append -last-value 0
更多推荐
所有评论(0)