Datax是一个开源的数据交换工具,支持关系型数据库(MySQL、Oracle等)、HDFS、Hive、ODPS、HBase、FTP等各种异构数据源之间稳定高效的数据同步,很多商业化的大数据平台也会将Datax作为ETL组件。
本文介绍将datax集成到springboot中进行开发。

Demo代码:https://gitee.com/zengzhenhui/springboot-datax


一、下载官方安装包

1.下载地址:http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz
2.解压后保留conf、job、lib、plugin目录,其余目录和文件可以删掉。其中plugin目录只保留需要的插件。
3.将datax目录移动到springboot项目目录下
file

二、导入必要的依赖

pom.xml中加入以下依赖,其中版本号为0.0.1-SNAPSHOT的依赖包可从datax压缩包中获取,需要安装到本地仓库或上传到私有仓库。

        <dependency>
            <groupId>com.alibaba.datax</groupId>
            <artifactId>datax-core</artifactId>
            <version>0.0.1-SNAPSHOT</version>
        </dependency>
        <dependency>
            <groupId>com.alibaba.datax</groupId>
            <artifactId>datax-common</artifactId>
            <version>0.0.1-SNAPSHOT</version>
        </dependency>
        <dependency>
            <groupId>com.alibaba.datax</groupId>
            <artifactId>datax-transformer</artifactId>
            <version>0.0.1-SNAPSHOT</version>
        </dependency>
        <dependency>
            <groupId>com.alibaba.datax</groupId>
            <artifactId>mysqlreader</artifactId>
            <version>0.0.1-SNAPSHOT</version>
        </dependency>
        <dependency>
            <groupId>com.alibaba.datax</groupId>
            <artifactId>mysqlwriter</artifactId>
            <version>0.0.1-SNAPSHOT</version>
        </dependency>
        <dependency>
            <groupId>com.alibaba.datax</groupId>
            <artifactId>oraclereader</artifactId>
            <version>0.0.1-SNAPSHOT</version>
        </dependency>
        <dependency>
            <groupId>com.alibaba.datax</groupId>
            <artifactId>plugin-rdbms-util</artifactId>
            <version>0.0.1-SNAPSHOT</version>
        </dependency>
        <dependency>
            <groupId>com.alibaba.datax</groupId>
            <artifactId>ojdbc6</artifactId>
            <version>11.2.0.3</version>
        </dependency>
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
        </dependency>
        <dependency>
            <groupId>commons-cli</groupId>
            <artifactId>commons-cli</artifactId>
            <version>1.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.httpcomponents</groupId>
            <artifactId>httpclient</artifactId>
            <version>4.4</version>
        </dependency>
        <dependency>
            <groupId>commons-io</groupId>
            <artifactId>commons-io</artifactId>
            <version>2.4</version>
        </dependency>
        <dependency>
            <groupId>org.apache.commons</groupId>
            <artifactId>commons-lang3</artifactId>
            <version>3.3.2</version>
        </dependency>
        <dependency>
            <groupId>commons-lang</groupId>
            <artifactId>commons-lang</artifactId>
            <version>2.6</version>
        </dependency>
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>druid</artifactId>
            <version>1.0.15</version>
        </dependency>
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>1.2.73</version>
        </dependency>

三、执行函数

    @Value("${datax.home}")
    private String dataxHome;
	
    public void execute(String jobConfig) {
        System.setProperty("datax.home", dataxHome);
        System.setProperty("now", LocalTime.now().toString());
        String[] dataxArgs = {"-job", dataxHome + "/job/" + jobConfig, "-mode", "standalone", "-jobid", "-1"};
        try {
            Engine.entry(dataxArgs);
        } catch (Throwable e) {
            e.printStackTrace();
        }
    }
  • dataxHome参数为前面下载的datax目录路径;
  • jobConfig 参数是配置数据同步任务的json文件,放在${dataxHome}/job目录下,格式为xxx.json;
Logo

为开发者提供学习成长、分享交流、生态实践、资源工具等服务,帮助开发者快速成长。

更多推荐