Druid连接报错的原因分析笔记

在使用druid的时候难免会有一些问题，之前就发现了这个问题。但是作者发现其对业务没有影响，所以打算拖一拖。主要是不影响业务，就会会频繁的报错。今天组内其他同学的业务的报错，然后也没定位...

坚持学习的Lele

13553人浏览 · 2021-01-23 15:20:02

坚持学习的Lele · 2021-01-23 15:20:02 发布

在使用druid的时候难免会有一些问题，之前就发现了这个问题。但是作者发现其对业务没有影响，所以打算拖一拖。主要是不影响业务，就会会频繁的报错。今天组内其他同学的业务的报错，然后也没定位到问题，最后反馈了druid连接池报错的问题，也就是上述作者说的对业务影响不大的那个报错，至于同事反映的真正的问题是什么，现在还没有复现。现在还是主要将druid的问题解决了再说吧，druid报错如下。

([com.alibaba.druid.pool.DruidAbstractDataSource:]) discard long time none received connection. ,
 jdbcUrl : jdbc:mysql://*****/&&&&?useUnicode=true&characterEncoding=utf8&serverTimezone=GMT%2B8&useSSL=false, 
 jdbcUrl : jdbc:mysql://*****/&&&&?useUnicode=true&characterEncoding=utf8&serverTimezone=GMT%2B8&useSSL=false, 
 lastPacketReceivedIdleMillis : 280572

这个报错的大概意思就是说没有连接了。至于还做了什么事情，现在还不知道。但是其实对业务来说确实没有影响很大，系统还是好好的。那么接下来就是错误的定位。下面是定位问题的过程。

1.去druid中找到相关代码，查看报错相关的变量的逻辑是什么。

//校验连接状态，其中的validationQuery就是我们要执行的sql语句，validationQueryTimeOut表示执行sql的超时时间，valid表示是否检验成功或者正常响应。
valid = this.validConnectionChecker.isValidConnection(conn, this.validationQuery, this.validationQueryTimeout);
//系统时间
            long currentTimeMillis = System.currentTimeMillis();
            if (holder != null) {
                holder.lastValidTimeMillis = currentTimeMillis;
                holder.lastExecTimeMillis = currentTimeMillis;
            }
      //如果连接校验成功，并且是mysql数据库的话
            if (valid && this.isMySql) {
//获取最近一次查询数据库接受数据包的时间。
                long lastPacketReceivedTimeMs = MySqlUtils.getLastPacketReceivedTimeMs(conn);
                if (lastPacketReceivedTimeMs > 0L) {
//当前时间与最后一次接受数据的时间差
                    long mysqlIdleMillis = currentTimeMillis - lastPacketReceivedTimeMs;
//如果时间差大于druid的配置timeBetweenEvictionRunsMislis就开始报错。
//timeBetweenEvictionRunsMislis的默认 时间是60秒
                    if (lastPacketReceivedTimeMs > 0L && mysqlIdleMillis >= this.timeBetweenEvictionRunsMillis) {
                        this.discardConnection(holder);
                        String errorMsg = "discard long time none received connection. , jdbcUrl : " + this.jdbcUrl + ", jdbcUrl : " + this.jdbcUrl + ", lastPacketReceivedIdleMillis : " + mysqlIdleMillis;
                        LOG.error(errorMsg);
                        boolean var13 = false;
//返回false
                        return var13;
                    }
                }
            }

通过上述分析，我们要立马就知道，我们要想不报错就需要将timeBetweenEvictionRunsMislis的时间变大。于是我给同事说将这个值改大了再试试，结果还是在报错。所以这块我们对这块的理解还是有问题。主要是lastPacketReceivedTimeMs 代表的是最后一次接受数据包的时间。显然这里的时间可以是昨天或者很久以前。那么为了控制让程序不要执行这里的报错和返回false，只能通过修改这里的valid了。我们看一下这里的检验连接可用性的代码。

public boolean isValidConnection(Connection conn, String validateQuery, int validationQueryTimeout) throws Exception {
        if (conn.isClosed()) {
            return false;
        } else {
//判断usePingMethod是否为true
            if (this.usePingMethod) {
                if (conn instanceof DruidPooledConnection) {
                    conn = ((DruidPooledConnection)conn).getConnection();
                }
                if (conn instanceof ConnectionProxy) {
                    conn = ((ConnectionProxy)conn).getRawObject();
                }
                if (this.clazz.isAssignableFrom(conn.getClass())) {
                    if (validationQueryTimeout <= 0) {
                        validationQueryTimeout = 1;
                    }
                    try {
//使用该类中的ping类进行执行。这里的ping并不是执行sql，而是ping命令
                        this.ping.invoke(conn, true, validationQueryTimeout * 1000);
                        return true;
                    } catch (InvocationTargetException var11) {
                        Throwable cause = var11.getCause();
                        if (cause instanceof SQLException) {
                            throw (SQLException)cause;
                        }
                        throw var11;
                    }
                }
            }
//如果usePingMethod为false，那么就执行sql
            String query = validateQuery;
//这里的validateQuery，就是spring.datasource.druid.validation-query的配置项，这里如果为空的话，就采用select 1；
            if (validateQuery == null || validateQuery.isEmpty()) {
                query = "SELECT 1";
            }
            Statement stmt = null;
            ResultSet rs = null;
            boolean var7;
            try {
                stmt = conn.createStatement();
                if (validationQueryTimeout > 0) {
                    stmt.setQueryTimeout(validationQueryTimeout);
                }
//执行sql，这里会有sql的包的接受
                rs = stmt.executeQuery(query);
                var7 = true;
            } finally {
                JdbcUtils.close(rs);
                JdbcUtils.close(stmt);
            }
            return var7;
        }
    }

那么这里的usePingMethod又是如何判断的，按理说这里的值应该是true，然后导致后边的检索库的sql没有进行。也就导致获取最近的接受mysql数据包的时间没有更新。那么问题的核心又变成了usePingMethod的赋值问题。我们滚动idea，发现在初始化的时候就就行了usePingMethod的初始化。

public MySqlValidConnectionChecker() {
        try {
//首先通过指定类的默认属性来设置
            this.clazz = Utils.loadClass("com.mysql.jdbc.MySQLConnection");
            if (this.clazz == null) {
                this.clazz = Utils.loadClass("com.mysql.cj.jdbc.ConnectionImpl");
            }
            if (this.clazz != null) {
                this.ping = this.clazz.getMethod("pingInternal", Boolean.TYPE, Integer.TYPE);
            }
            if (this.ping != null) {
                this.usePingMethod = true;
            }
        } catch (Exception var2) {
            LOG.warn("Cannot resolve com.mysql.jdbc.Connection.ping method.  Will use 'SELECT 1' instead.", var2);
        }
//然后通过系统变量来兜底，说明了系统变量的有效性最强
        this.configFromProperties(System.getProperties());
    }
//从系统变量中获取配置druid.mysql.usePingMethod
    public void configFromProperties(Properties properties) {
        String property = properties.getProperty("druid.mysql.usePingMethod");
//这里是否通过sql来判断数据库连接的可用性
        if ("true".equals(property)) {
            this.setUsePingMethod(true);
        } else if ("false".equals(property)) {
            this.setUsePingMethod(false);
        }
    }

分析到这里我们就明白了，我们应该在项目启动的时候在脚本中指定如下的配置才能避免这个错误。除此之外我们的spring.datasource.druid.validation-query其实配置和不配置关系不大。

-Ddruid.mysql.usePingMethod=false

通过上述分析，我们大概明白了错误的原因，那么我们需要明白这个错误导致返回false，最后是否会对业务有什么影响。我们需要知道是谁在调这块的代码。

当testConnectionInternal返回为false的时候，就会执行这里的discardConnection方法，看样子，这里要释放这个连接了。而通过我们之前的学习，我们知道我们的连接其实都是放到一个数组中。那么我们看一下这个方法是如何处理的，代码如下：

public void discardConnection(DruidConnectionHolder holder) {
        if (holder != null) {
            Connection conn = holder.getConnection();
            if (conn != null) {
              //这块应该是释放的主要动作了。
                JdbcUtils.close(conn);
            }
      //多线程加锁
            this.lock.lock();
            try {
                if (holder.discard) {
                    return;
                }
                if (holder.active) {
                //让活动的连接数减少一个
                    --this.activeCount;
                    holder.active = false;
                }
              //销毁的个数+1
                ++this.discardCount;
                holder.discard = true;
          //如果活动的线程小于最小的连接数
                if (this.activeCount <= this.minIdle) {
//这里释放信号量，负责创建的线程会新建连接
                    this.emptySignal();
                }
            } finally {
                this.lock.unlock();
            }
        }
    }

我们看一看 JdbcUtils.close(conn);主要设置了一下判断标志

    public static void close(Connection x) {
        if (x != null) {
            try {
                if (x.isClosed()) {
                    return;
                }
                x.close();
            } catch (Exception var2) {
                LOG.debug("close connection error", var2);
            }
        }
    }

通过上述代码，我们并没有发现销毁数组中连接的操作，那么我们看创建新链接的的时候有没有相关的逻辑，方法如下：

    private void emptySignal() {
        if (this.createScheduler == null) {
            this.empty.signal();
        } else if (this.createTaskCount < this.maxCreateTaskCount) {
//如果创建的连接小于最大的连接
            if (this.activeCount + this.poolingCount + this.createTaskCount < this.maxActive) {
//正在活动的连接+连接池中剩下的+现在要创建的之<允许的最大的连接；
// 就创建一个
                this.submitCreateTask(false);
            }
        }
    }

这里说一下，这里做了一写处理。那么我的连接到底释放了没有？这个问题，我们需要看一下，我们的链接是如何获取的。好像这里是通过阻塞队列获取连接的，所以获取到链接之后就和队列没有关系了？但是现在还不该确定，之前说的好好的，就是connections[]数组呀，怎么变成了队列？

这块我们再看一下druid中的获取连接的初始化代码。

我们继续追踪一下这里的takeLast方法：

DruidConnectionHolder takeLast() throws InterruptedException, SQLException {
        try {
            while(this.poolingCount == 0) {
//如果连接池中连接数为0,就发信号，让其创建的线程来创建
                this.emptySignal();
                if (this.failFast && this.isFailContinuous()) {
                    throw new DataSourceNotAvailableException(this.createError);
                }
                ++this.notEmptyWaitThreadCount;
                if (this.notEmptyWaitThreadCount > this.notEmptyWaitThreadPeak) {
                    this.notEmptyWaitThreadPeak = this.notEmptyWaitThreadCount;
                }
                try {
//如果池子中为空，就阻塞。等待创建线程创建好了之后进行唤醒
                    this.notEmpty.await();
                } finally {
                    --this.notEmptyWaitThreadCount;
                }
                ++this.notEmptyWaitCount;
                if (!this.enable) {
                    connectErrorCountUpdater.incrementAndGet(this);
                    if (this.disableException != null) {
                        throw this.disableException;
                    }
                    throw new DataSourceDisableException();
                }
            }
        } catch (InterruptedException var5) {
            this.notEmpty.signal();
            ++this.notEmptySignalCount;
            throw var5;
        }
//将连接池中的连接减少一个，因为这个要出去干活了。
        this.decrementPoolingCount();
//拿到这个连接
        DruidConnectionHolder last = this.connections[this.poolingCount];
//设置为空，gc的时候进行空间释放
        this.connections[this.poolingCount] = null;
//返回
        return last;
    }

通过上述分析，我们知道获取连接的时候是通过connects数组获取的，获取之后就交给业务。所以说上边的释放对连接池没有任何影响，所以对业务没有影响。通过上述分析，我们对druid数据库连接池的工作过程有了很近一步的理解。至于上述的连接置为null的操作在线程池中也是相同的做法。

这里在此回顾一下，对于druid连接池来说。报错信息

([com.alibaba.druid.pool.DruidAbstractDataSource:]) discard long time none received connection. ,
 jdbcUrl : jdbc:mysql://*****/&&&&?useUnicode=true&characterEncoding=utf8&serverTimezone=GMT%2B8&useSSL=false, 
 jdbcUrl : jdbc:mysql://*****/&&&&?useUnicode=true&characterEncoding=utf8&serverTimezone=GMT%2B8&useSSL=false, 
 lastPacketReceivedIdleMillis : 280572

不会影响业务。避免报错的方法是在项目启动的时候通过脚本添加是否使用ping命令来检测连接的可用性。druid读取该配置的时候直接读取的系统变量。所以在项目中添加配置是没有作用的。当上述报错产生之后，druid会将连接销毁，并尝试从连接池中获取新链接。如果没有的话就会创建。其中的界限条件就是最小连接数，最大连接数等。除此之外，druid有两个线程，分别为连接创建线程和心跳检测线程。他们相互配合保证连接的可用性和连接异步创建，所以对业务来说，总是有连接可用的。