Sqoop的使用

使用sqoop前需要启动Hadoop,检查Hadoop的运行状态,我们这里使用的是伪分布式

[root@hr conf]# jps
1538 DataNode
2114 NodeManager
1689 SecondaryNameNode
1834 ResourceManager
1421 NameNode
2831 Jps

Sqoop连接MySQL需要MySQLjar包,吧jar包放置Sqooplib目录下

先测试Sqoop能否连接MySQL查看存在的数据库

sqoop list-databases --connect jdbc:mysql://192.168.0.161:3306/ --username root --password 123456

结果中的警告是因为没有配置Hbase,忽略即可

Warning: /home/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
22/03/16 04:27:14 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
22/03/16 04:27:14 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
22/03/16 04:27:14 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
Wed Mar 16 04:27:14 EDT 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
information_schema
hive
mysql
performance_schema
sys

为了连接HIve我们需要将Hive组件lib目录下的hive-common-3.1.2.jar放到Sqoop组件的lib目录下

cp /home/hive/lib/hive-common-3.1.2.jar /home/sqoop/lib/

在MySQL中创建一个数据库和数据表,并随机插入一些数据

创建数据库和表

create database sample;
create table student(number char(9) primary key, name varchar(10));

添加数据

insert into student values('01','zhangsan');
insert into student values('02','lisi');
insert into student values('03','wangwu');

查看数据

mysql> select * from student;
+--------+----------+
| number | name     |
+--------+----------+
| 01     | zhangsan |
| 02     | lisi     |
| 03     | wangwu   |
+--------+----------+
3 rows in set (0.00 sec)

mysql> 

Hive中创建数据库和数据表

hive> create database sample; 
OK
Time taken: 0.566 seconds
hive> use sample;
OK
Time taken: 0.015 seconds
hive>  create table student(number STRING, name STRING) row format delimited fields terminated by ',';
Time taken: 0.499 seconds

MySQL 导出数据,导入Hive

sqoop参数说明

参数意义
–connectMySQL 数据库连接 URL。
–username 和–passwordMySQL 数据库的用户名和密码。
–table导出的数据表名。
–fields-terminated-byHive 中字段分隔符。
–delete-target-dir删除导出目的目录。
–num-mappersHadoop 执行 Sqoop 导入导出启动的 map 任务数。
–hive-import --hive-database导出到 Hive 的数据库名。
–hive-table导出到 Hive 的表名。
sqoop import --connect jdbc:mysql://192.168.0.161:3306/sample --username root --password 123456 --table student --fields-terminated-by ',' --delete-target-dir --num-mappers 1 --hive-import --hive-database sample --hive-table student

运行日志

Warning: /home/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
22/03/16 05:03:24 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
22/03/16 05:03:24 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
22/03/16 05:03:24 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
22/03/16 05:03:24 INFO tool.CodeGenTool: Beginning code generation
Wed Mar 16 05:03:24 EDT 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
22/03/16 05:03:25 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
22/03/16 05:03:25 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
22/03/16 05:03:25 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop
Note: /tmp/sqoop-root/compile/05aad1252e73aa2f62b414a7722fe98d/student.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
22/03/16 05:03:27 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/05aad1252e73aa2f62b414a7722fe98d/student.jar
22/03/16 05:03:28 INFO tool.ImportTool: Destination directory student is not present, hence not deleting.
22/03/16 05:03:28 WARN manager.MySQLManager: It looks like you are importing from mysql.
22/03/16 05:03:28 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
22/03/16 05:03:28 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
22/03/16 05:03:28 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
22/03/16 05:03:28 INFO mapreduce.ImportJobBase: Beginning import of student
22/03/16 05:03:28 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
22/03/16 05:03:28 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
22/03/16 05:03:28 INFO client.RMProxy: Connecting to ResourceManager at hr/192.168.0.161:8032
Wed Mar 16 05:03:30 EDT 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
22/03/16 05:03:30 INFO db.DBInputFormat: Using read commited transaction isolation
22/03/16 05:03:30 INFO mapreduce.JobSubmitter: number of splits:1
22/03/16 05:03:30 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1647416961901_0001
22/03/16 05:03:31 INFO impl.YarnClientImpl: Submitted application application_1647416961901_0001
22/03/16 05:03:31 INFO mapreduce.Job: The url to track the job: http://hr:8088/proxy/application_1647416961901_0001/
22/03/16 05:03:31 INFO mapreduce.Job: Running job: job_1647416961901_0001
22/03/16 05:03:40 INFO mapreduce.Job: Job job_1647416961901_0001 running in uber mode : false
22/03/16 05:03:40 INFO mapreduce.Job:  map 0% reduce 0%
22/03/16 05:03:47 INFO mapreduce.Job:  map 100% reduce 0%
22/03/16 05:03:48 INFO mapreduce.Job: Job job_1647416961901_0001 completed successfully
22/03/16 05:03:48 INFO mapreduce.Job: Counters: 30
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=133750
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=87
		HDFS: Number of bytes written=30
		HDFS: Number of read operations=4
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Other local map tasks=1
		Total time spent by all maps in occupied slots (ms)=3708
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=3708
		Total vcore-seconds taken by all map tasks=3708
		Total megabyte-seconds taken by all map tasks=3796992
	Map-Reduce Framework
		Map input records=3
		Map output records=3
		Input split bytes=87
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=68
		CPU time spent (ms)=1020
		Physical memory (bytes) snapshot=110944256
		Virtual memory (bytes) snapshot=2081656832
		Total committed heap usage (bytes)=30474240
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=30
22/03/16 05:03:48 INFO mapreduce.ImportJobBase: Transferred 30 bytes in 20.1651 seconds (1.4877 bytes/sec)
22/03/16 05:03:48 INFO mapreduce.ImportJobBase: Retrieved 3 records.
22/03/16 05:03:48 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table student
Wed Mar 16 05:03:48 EDT 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
22/03/16 05:03:48 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
22/03/16 05:03:48 INFO hive.HiveImport: Loading uploaded data into Hive
22/03/16 05:03:48 INFO conf.HiveConf: Found configuration file file:/home/hive/conf/hive-site.xml
22/03/16 05:03:49 INFO hive.HiveImport: which: no hbase in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/home/jdk/bin:/home/jdk/bin:/home/sqoop/bin)
22/03/16 05:03:51 INFO hive.HiveImport: SLF4J: Class path contains multiple SLF4J bindings.
22/03/16 05:03:51 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/home/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
22/03/16 05:03:51 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/home/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
22/03/16 05:03:51 INFO hive.HiveImport: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
22/03/16 05:03:51 INFO hive.HiveImport: SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
22/03/16 05:03:52 INFO hive.HiveImport: Hive Session ID = 4b655330-3a4e-4ed4-b40f-599c1bff9652
22/03/16 05:03:52 INFO hive.HiveImport: 
22/03/16 05:03:52 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/home/hive/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true
22/03/16 05:03:59 INFO hive.HiveImport: Hive Session ID = 57f33b0b-6433-4c3e-8277-4f020467d73b
22/03/16 05:04:00 INFO hive.HiveImport: OK
22/03/16 05:04:00 INFO hive.HiveImport: Time taken: 1.046 seconds
22/03/16 05:04:00 INFO hive.HiveImport: Loading data to table sample.student
22/03/16 05:04:01 INFO hive.HiveImport: OK
22/03/16 05:04:01 INFO hive.HiveImport: Time taken: 0.805 seconds
22/03/16 05:04:02 INFO hive.HiveImport: Hive import complete.
22/03/16 05:04:02 INFO hive.HiveImport: Export directory is contains the _SUCCESS file only, removing the directory.

去查看HIve上的表

hive> select * from sample.student;
OK
01	zhangsan
02	lisi
03	wangwu
Time taken: 2.154 seconds, Fetched: 3 row(s)

Hive 导出数据,导入到 MySQL

参数意义
–connectMySQL 数据库连接 URL。
–username 和–passwordMySQL 数据库的用户名和密码。
–table导出的数据表名。
–fields-terminated-byHive 中字段分隔符。
–export-dirHive 数据表在 HDFS 中的存储路径。
sqoop export --connect "jdbc:mysql://192.168.0.161:3306/sample?useUnicode=true&characterEncoding=utf-8" --username root --password 123456 --table student --input-fields-terminated-by ',' --export-dir /user/hive/warehouse/sample.db/student/*

Sqoop其他的命令
列出 MySQL 数据库中的所有数据库。

sqoop list-databases -connect jdbc:mysql://localhost:3306/ -username root -password 123456

连接MySQL并列出 sample数据库中的表。

sqoop list-tables -connect jdbc:mysql://localhost:3306/sample -username root -password 123456

将关系型数据的表结构复制到hive中,只是复制表的结构,表中的内容没有复制过去。

sqoop create-hive-table -connect jdbc:mysql://localhost:3306/sample -table student -username root -password 123456 -hive-table test

其中–table studentMySQL 中的数据库 sample中的表–hive-table testhive中新建的表名称。


从关系数据库导入文件到Hive中。

sqoop import --connect jdbc:mysql://localhost:3306/sample --username root --password 123456 --table student --delete-target-dir --num-mappers 1 --hive-import --hive-database default --hive-table test

Hive中的表数据导入到 MySQL 中,在进行导入之前,MySQL中的表hive_test表必须已经提前创建好。

sqoop export -connect jdbc:mysql://localhost:3306/sample -username root -password 123456 -table student --input-fields-terminatedby '\001' -export-dir /user/hive/warehouse/test

从数据库导出表的数据到 HDFS 上文件。

sqoop import -connect jdbc:mysql://localhost:3306/sample -username root -password 123456 -table student --num-mappers 1 -targetdir /user/test

从数据库增量导入表数据到 HDFS中。

sqoop import -connect jdbc:mysql://localhost:3306/sample -username root -password 123456 -table student --num-mappers 1 -targetdir /user/test -check-column number -incremental append -last-value 0
Logo

为开发者提供学习成长、分享交流、生态实践、资源工具等服务,帮助开发者快速成长。

更多推荐