Hive介绍

hive是基于Hadoop的一个数据仓库工具,用来进行数据提取、转化、加载,这是一种可以存储、查询和分析存储在Hadoop中的大规模数据的机制。hive数据仓库工具能将结构化的数据文件映射为一张数据库表,并提供SQL查询功能,能将SQL语句转变成MapReduce任务来执行。Hive的优点是学习成本低,可以通过类似SQL语句实现快速MapReduce统计,使MapReduce变得更加简单,而不必开发专门的MapReduce应用程序。hive十分适合对数据仓库进行统计分析。

HA Hadoop 高可用集群部署

HA Hadoop 高可用集群部署

部署环境介绍

  1. 系统:CentOS Linux release 7.5.1804 (Core)
  2. Hadoop:hadoop-2.7.3
  3. Zookeeper:zookeeper-3.4.10
  4. Jdk: jdk1.8.0_171
  5. Hive:apache-hive-2.3.9

软件准备

wget https://mirrors.cnnic.cn/apache/hive/hive-2.3.9/apache-hive-2.3.9-bin.tar.gz
wget https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.44.tar.gz

集群部署规划

主机地址系统用户安装软件
hadoop-3192.168.10.53hadoophive
hadoop-4192.168.10.54hadoophive
hadoop-5192.168.10.55hadoophive

Hbase集群搭建

  1. 将hbase包解压至指定目录并解压
[hadoop@hadoop-3 software]$ cd /data/software/
[hadoop@hadoop-3 software]$ tar -zxf hbase-1.4.5-bin.tar.gz -C /data/
[hadoop@hadoop-3 software]$ cd /data/
[hadoop@hadoop-3 data]$ mv hbase-1.4.5-bin hive
  1. 配置环境变量
[hadoop@hadoop-3 software]$ cat /etc/profile.d/hadoop.sh 
export JAVA_HOME=/data/java/jdk1.8.0_171
export JRE_HOME=/data/java/jdk1.8.0_171/jre
export CLASSPATH=./:/data/java/jdk1.8.0_171/lib:/data/java/jdk1.8.0_171/jre/lib
export HADOOP_HOME=/data/hadoop
export ZOOKEEPER_HOME=/data/zookeeper
export HBASE_HOME=/data/hbase
export HIVE_HOME=/data/hive
export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZOOKEEPER_HOME/bin:$HBASE_HOME/bin:$HIVE_HOME/bin
  1. 修改hive-env.sh文件
[hadoop@hadoop-3 software]$ cd /data/hive/conf/
[hadoop@hadoop-3 conf]$ cp -a hive-env.sh
hive-env.sh           hive-env.sh.template  
[hadoop@hadoop-3 conf]$ cp -a hive-env.sh.template hive-env.sh
[hadoop@hadoop-3 conf]$ grep -n 'HADOOP_HOME' hive-env.sh
47:# Set HADOOP_HOME to point to a specific hadoop install directory
49:HADOOP_HOME=/data/hadoop
  1. 修改hive-site.xml文件
[hadoop@hadoop-3 conf]$ cp -a hive-default.xml.template hive-site.xml 
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://mysql-1:3306/hive?useSSL=false</value>
    <description>
      JDBC connect string for a JDBC metastore.
      To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
      For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
    </description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hive</value>
    <description>Username to use against metastore database</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>hive</value>
    <description>password to use against metastore database</description>
  </property>
  <!--配置缓存目录-->
  <property>
    <name>hive.exec.local.scratchdir</name>
    <value>/data/hive/iotmp</value>
    <description>Local scratch space for Hive jobs</description>
  </property>
  <property>
    <name>hive.downloaded.resources.dir</name>
    <value>/data/hive/iotmp</value>
    <description>Temporary local directory for added resources in the remote file system.</description>
  </property>
  <!--权限管理配置-->
  <property>
    <name>hive.security.authorization.enabled</name>
    <value>true</value>
    <description>打开认证</description>
  </property>
  <property>
    <name>hive.server2.enable.doAs</name>
    <value>false</value>
    <description>代理,默认是true</description>
  </property>
  <property>
    <name>hive.users.in.admin.role</name>
    <value>admin</value>
    <description>添加admin角色用户,可添加多个</description>
  </property>
  <property>
    <name>hive.security.authorization.manager</name>
    <value>org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactory</value>
    <description>认证管理类</description>
  </property>
  <property>
    <name>hive.security.authenticator.manager</name>
    <!-- <value>org.apache.hadoop.hive.ql.security.HadoopDefaultAuthenticator</value>-->
    <value>org.apache.hadoop.hive.ql.security.SessionStateUserAuthenticator</value>
    <description>认证管理类</description>
  </property>
  1. 创建缓存目录
adoop@hadoop-3 ~]$ mkdir /data/hive/iotmp
  1. 修改hive-config.sh文件
[hadoop@hadoop-3 ~]$ grep -n 'export' /data/hive/bin/hive-config.sh 
20:export JAVA_HOME=/data/java/jdk1.8.0_171
21:export HIVE_HOME=/data/hive
22:export HADOOP_HOME=/data/hadoop
  1. 修改log日志的存放路径
[hadoop@hadoop-3 software]$ cd /data/hive/conf/
[hadoop@hadoop-3 conf]$ cp -a hive-log4j2.properties.template hive-log4j2.properties
[hadoop@hadoop-3 conf]$ grep -n 'hive.log.dir' hive-log4j2.properties
25:property.hive.log.dir = /data/hive/logs
  1. 将mysql-connector-java-5.1.44-bin.jar文件放置/data/hive/lib目录下
[hadoop@hadoop-3 software]$ cd /data/software/
[hadoop@hadoop-3 software]$ tar -zxf mysql-connector-java-5.1.44.tar.gz                         
[hadoop@hadoop-3 software]$ mv mysql-connector-java-5.1.44/mysql-connector-java-5.1.44-bin.jar /data/hive/lib/
  1. 将配置好的hive传至其他节点
[hadoop@hadoop-3 conf]$ scp -r /data/hive/ hadoop@hadoop-4:/data/
[hadoop@hadoop-3 conf]$ scp -r /data/hive/ hadoop@hadoop-5:/data/
  1. slave节点更改hive-site.xml 文件
  <property>
    <name>hive.metastore.uris</name>
    <value>thrift://hadoop-3:9083</value>
    <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
  </property>
  1. mysql创建数据库并赋予hive用户权限
[root@mysql-1 ~]# mysql -uroot -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 3
Server version: 5.7.30-log MySQL Community Server (GPL)

Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> create database hive;
Query OK, 1 row affected (0.01 sec)

mysql> grant all privileges on hive.* to hive@'%' identified by 'hive';
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

  1. 初始化 Hive 元数据库
[hadoop@hadoop-3 conf]$ schematool -initSchema -dbType mysql - verbose
  1. 启动metastore服务
[hadoop@hadoop-3 ~]$ cd /data/hive/bin
[hadoop@hadoop-3 bin]$ nohup hive --service metastore &
[1] 3483
[hadoop@hadoop-3 bin]$ nohup: 忽略输入并把输出追加到"nohup.out"
[hadoop@hadoop-3 bin]$ netstat -lntp | grep 9083
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 0.0.0.0:9083            0.0.0.0:*               LISTEN      3483/java         
  1. 启动hiveserver2服务
[hadoop@hadoop-4 ~]$ nohup hive --service hiveserver2 &
[1] 2974
[hadoop@hadoop-4 ~]$ nohup: 忽略输入并把输出追加到"nohup.out"
[hadoop@hadoop-4 ~]$ netstat -lntp | grep 1000
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 0.0.0.0:10000           0.0.0.0:*               LISTEN      2974/java           
tcp        0      0 0.0.0.0:10002           0.0.0.0:*               LISTEN      2974/java           
  1. 链接测试
[hadoop@hadoop-3 bin]$ beeline 
beeline> !connect jdbc:hive2://192.168.10.54:10000
Connecting to jdbc:hive2://192.168.10.54:10000
Enter username for jdbc:hive2://192.168.10.54:10000: root
Enter password for jdbc:hive2://192.168.10.54:10000: 
Connected to: Apache Hive (version 2.3.9)
Driver: Hive JDBC (version 2.3.9)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://192.168.10.54:10000> show databases;
+----------------+
| database_name  |
+----------------+
| default        |
| xiaoming       |
+----------------+
2 rows selected (1.863 seconds)
0: jdbc:hive2://192.168.10.54:10000> set role admin;
No rows affected (0.094 seconds)
0: jdbc:hive2://192.168.10.54:10000> show roles;
+-----------+
|   role    |
+-----------+
| admin     |
| public    |
| xiaoming  |
+-----------+
3 rows selected (0.053 seconds)
0: jdbc:hive2://192.168.10.54:10000> 
Logo

华为开发者空间,是为全球开发者打造的专属开发空间,汇聚了华为优质开发资源及工具,致力于让每一位开发者拥有一台云主机,基于华为根生态开发、创新。

更多推荐