SparkSQL以yarn-cluster读取hive数据运行的错误排查记录
我事先在IDEA里通过Maven将程序及所依赖的jar打成一个jar包,在三台虚拟机(master、slave1、slave2)里启动了hadoop、mysql、yarn。master和slave节点的$SPARK_HOME\conf已经有hive-site.xml,并且已把mysql连接驱动包放到每个节点的SPARK_CLASSPATH下,然后执行:[root@master bin]# ./s.
我事先在IDEA里通过Maven将程序及所依赖的jar打成一个jar包,在三台虚拟机(master、slave1、slave2)里启动了hadoop、mysql、yarn。master和slave节点的$SPARK_HOME\conf已经有hive-site.xml,并且已把mysql连接驱动包放到每个节点的SPARK_CLASSPATH下,然后执行:
[root@master bin]# ./spark-submit --class com.badou.JiebaKry --master yarn-cluster /home/boya/boya-1.0-SNAPSHOT.jar
发现执行失败,在yarn UI界面里查看log信息,报如下错误:
“Table or view not found: ‘mydb’.‘news_noseg’; line 1 pos 27; ”
于是我就将hive-site.xml和mysql驱动包通过–files和–jars放到提交命令中,执行:
[root@master conf]# ./spark-submit --class com.badou.JiebaKry --master yarn-cluster --files $HIVE_HOME/conf/hive-site.xml --jars /home/boya/mysql-connector-java-5.1.44-bin.jar /home/boya/boya-1.0-SNAPSHOT.jar
又失败了,查看log,发现salve报如下错误:
“WARN HiveMetaStore: Retrying creating default database after error: Error creating transactional connection factory javax.jdo.JDOFatalInternalException: Error creating transactional connection factory” 和
“Hive:Unable to open a test connection to the given database. JDBC url = jdbc:mysql://localhost:3306”**
此时我猜想是slave节点无法访问master节点上的mysql数据库,我注意到错误信息中“mysql://localhost"主机名是localhost,于是我查看master上的hive-site.xml,发现主机名写的是localhost:
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>
</property>
于是我将其修改为master:
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true</value>
</property>
这样使得slave节点读取hive-site.xml时去连接master上的mysql,因为master节点上hive的元数据是存储在mysql里,但是重新启动hive竟然报如下错误:
“Caused by: javax.jdo.JDOFatalDataStoreException: Unable to open a test connection to the given database. JDBC url = jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true, username = root. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). ”
又经过一番搜寻,可能是mysql的权限问题,于是按照如下步骤:
[root@master ~]# mysql -u root -p
mysql> use mysql;
mysql> GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY 'yourpassword' WITH GRANT OPTION;
mysql> GRANT ALL PRIVILEGES ON *.* TO 'root'@'master' IDENTIFIED BY 'yourpassword' WITH GRANT OPTION;
mysql> flush privileges;
mysql> quit;
[root@master ~]# mysql -u root -p
这里使用grant目的是允许指定host机器上的root用户远程连接到mysql,这样别的机器想要访问master的mysql,就会从user表查询,是否有%,localhost,master等数据值。改好后,重新启动mysql和hive,hive正常启动。
接下来继续重新执行:
[root@master conf]# ./spark-submit --class com.badou.JiebaKry --master yarn-cluster --files $HIVE_HOME/conf/hive-site.xml --jars /home/boya/mysql-connector-java-5.1.44-bin.jar /home/boya/boya-1.0-SNAPSHOT.jar
这次终于不再出任何问题,final status: SUCCEEDED!
更多推荐
所有评论(0)