序言

触发ANR的场景有几种,比如输入超时,Boradcast长时间无响应等等,具体场景和超时时间如下:

本文以最常见的Input类型事件导致的ANR为例,带读者们进行详细的解读以下内容:

1.介绍输入事件的ANR的判定逻辑;

2.介绍发生ANR之后,ANR相关的信息是如何采集的;

3.介绍ANR的logcat和trace文件在系统中是何时记录生成,以及如何解读;

4.一些围绕ANR的扩展性问题讲解;

注:本文以android13的源码为例进行解读的

一.ANR判定逻辑

1.1事件分发流程概览

 了解ANR之前,一定了解事件分发的整个流程才行。这里为了方便读者的阅读,我这里简单介绍一下,如果想知道完整的流程,推荐看我的另外一篇文章:

https://blog.csdn.net/rzleilei/article/details/126795965

主要流程如下图所示:

首先通过EventHub中读取到输入事件,然后交由InputDispatcher进行处理。

InputDispatcher中有一个线程,死循环执行dispatchOnce方法,该方法负责分发消息给APP侧,以及接受APP侧的返回并执行相关以后,最后执行ANR的判断。所以dispatchOnce属于整个的流程的核心,也是ANR处理流程的核心,所以我们重点了解下dispatchOnce这个方法。

如下图所示:

dispatchOnce中要完成四件事:

1.dispatchOnceInnerLocked()方法负责把收到的输入信号分发给APP处理,发送成功会加入到InputDispatcher里的waitQueue队列和AnrTracker.cpp里的mAnrTimeouts。

2.haveCommandsLocked()中查看队列中是否有任务,如果有就执行任务。这些任务就是执行doDispatchCycleFinishedCommand方法,该方法中,会根据收到的完成信号,完成对应的事件从waitQueue和mAnrTimeouts中移除的处理。

 // Notify other system components and prepare to start the next dispatch cycle.
    auto command = [this, currentTime, connection, seq, handled, consumeTime]() REQUIRES(mLock) {
        doDispatchCycleFinishedCommand(currentTime, connection, seq, handled, consumeTime);
    };

3.processAnrsLocked中会进行一些逻辑判断,如果符合条件,则会触发ANR流程。

4.pollOnce进入休眠,等待下一次的循环。

接下来,我们就具体的讲解下上面的这些流程。

1.2.dispatchOnceInnerLocked分发事件到APP侧

具体的分发的流程我们就多讲解了,直接看下面的流程图就好,这里仍是以点击事件为例。

前面的步骤我们现略过,我们先专注于startDispatchCycleLocked方法,如果对上面的流程感兴趣的读者可以参看我开头提到的那篇文章。

startDispatchCycleLocked的核心代码如下:

void InputDispatcher::startDispatchCycleLocked(nsecs_t currentTime,
                                               const sp<Connection>& connection) {
   ...

    while (connection->status == Connection::Status::NORMAL && !connection->outboundQueue.empty()) {
        DispatchEntry* dispatchEntry = connection->outboundQueue.front();
        dispatchEntry->deliveryTime = currentTime;
        const std::chrono::nanoseconds timeout = getDispatchingTimeoutLocked(connection);
        //1.计算超时时间
        dispatchEntry->timeoutTime = currentTime + timeout.count();
                ...
                //2.发送给APP一侧
                status = connection->inputPublisher
                                 .publishMotionEvent(...)
            ...
            ...
        //3.出栈
        connection->outboundQueue.erase(...);
        //4.入栈
        connection->waitQueue.push_back(dispatchEntry);
        if (connection->responsive) {
            //5.入栈
            mAnrTracker.insert(dispatchEntry->timeoutTime,
                               connection->inputChannel->getConnectionToken());
        }
        ...
    }
}

核心流程如下:

1.针对每个事件计算超时时间,超时时间就是当前时间+5S。

2.把事件发送给APP一侧。

3.从outboundQueue队列中出栈,因为成功发送给APP一侧了,该队列存储的是应该发送给APP侧但是还未发送成功的。

4.加入到waitQueue队列中,该队列中存储的是已经成功发送给APP一侧,但是还未收到回应的。

5.加入到AnrTracker.cpp的mAnrTimeouts集合中,该集合的作用和waitQueue差不多,但是该集合专注于超时计算,是否弹出ANR,就是由该队列中的元素决定的。

1.3 runCommandsLockedInterruptable执行事件完成操作

APP侧消费完成事件后,会发送一个通知给系统侧,系统侧收到这个通知后,会生成一个任务,加入到mCommandQueue队列中,然后切换到InputDispatcher线程中去执行。执行这些任务的方法就是runCommandsLockedInterruptable。

而具体的任务注册,是doDispatchCycleFinishedCommand方法,具体代码如下:

 auto command = [this, currentTime, connection, seq, handled, consumeTime]() REQUIRES(mLock) {
        doDispatchCycleFinishedCommand(currentTime, connection, seq, handled, consumeTime);
    };
    postCommandLocked(std::move(command));

该方法中,回判断事件是否结束,如果结束,最终会走到doDispatchCycleFinishedCommand方法。最终这个方法中,回完成以下操作:

1.把对应事件从waitQueue队列中移除。

2.把对应事件从mAnrTracker集合中移除

3.释放对应事件。

1.3 processAnrsLocked执行ANR判断相关逻辑

processAnrsLocked中,主要有两块逻辑回进行ANR提示

1.如果没有获取到焦点的应用,则超过5S会走ANR流程。

2.如果有应用获取到焦点,但是已经过了超时事件,则也会走ANR流程。

nsecs_t InputDispatcher::processAnrsLocked() {
   
    if (mNoFocusedWindowTimeoutTime.has_value() && mAwaitedFocusedApplication != nullptr) {
         //第一种情况
        if (currentTime >= *mNoFocusedWindowTimeoutTime) {
            processNoFocusedWindowAnrLocked();
            mAwaitedFocusedApplication.reset();
            mNoFocusedWindowTimeoutTime = std::nullopt;
            return LONG_LONG_MIN;
        } else {
            // Keep waiting. We will drop the event when mNoFocusedWindowTimeoutTime comes.
            nextAnrCheck = *mNoFocusedWindowTimeoutTime;
        }
    }

    // Check if any connection ANRs are due
    nextAnrCheck = std::min(nextAnrCheck, mAnrTracker.firstTimeout());
    if (currentTime < nextAnrCheck) { // most likely scenario
        return nextAnrCheck;          // everything is normal. Let's check again at nextAnrCheck
    }

    // If we reached here, we have an unresponsive connection.
    sp<Connection> connection = getConnectionLocked(mAnrTracker.firstToken());
    if (connection == nullptr) {
        ALOGE("Could not find connection for entry %" PRId64, mAnrTracker.firstTimeout());
        return nextAnrCheck;
    }
    connection->responsive = false;
    // Stop waking up for this unresponsive connection
    mAnrTracker.eraseToken(connection->inputChannel->getConnectionToken());
     //第二种情况
    onAnrLocked(connection);
    return LONG_LONG_MIN;
}

两种场景还是有区别的,这里我举两个例子:

ActivityA中点击按钮B跳转ActivityC,如果ActivityC中onCreate方法中sleep10秒,那就是场景一的情况。

ActivityA中点击按钮B跳转ActivityC,如果ActivityA中按钮B的监听方法中sleep10秒,那就是场景二的情况。

ActivityA中点击按钮B跳转ActivityC,如果ActivityC中onCreate执行一个延时1秒的主线程任务,该任务就是休眠10S,这种情况就不会触发ANR。

我们这里只介绍第二种情况,因为大多数的场景下,都是第二种场景。

首先会通过firstTimeout方法获取到所有的事件中,最早加入的那一条,返回其超时时间。

nsecs_t AnrTracker::firstTimeout() const {
    if (mAnrTimeouts.empty()) {
        return std::numeric_limits<nsecs_t>::max();
    }
    return mAnrTimeouts.begin()->first;
}

然后进行判断,如果当前时间>=超时时间,则说明已经超时了,走ANR逻辑,则会进入到onAnrLocked方法中,此时就进入到了ANR流程的逻辑。

1.4 小节

我们来回顾一下第一章所讲的,做一个总结:

InputReader读到输入事件后,就会传递到InputDispatcher,整个超时流程也都是由其负责的。InputDispatcher中主要是dispatchOnce方法来负责,它跑在单独的线程上。

首先它把收到的输入信号分发给APP一侧并记录到mAnrTimeouts集合上;

然后查看是否有APP侧传递过来的任务,如果有就执行,该任务就是把对应的输入事件从mAnrTimeouts集合中移除;

再然后判断mAnrTimeouts集合中是否有超时的事件,如果有就走ANR逻辑;

最后,一轮逻辑走完了,进入休眠,等待下一轮的唤醒。

二.ANR流程

如果第一章中,判断已经超时,则会进入到到ANR的流程。

2.1 ANR流程分发

我们从onAnrLocked开始,整个流程如下图所示:

具体流程就不详细赘述了,如上面的流程图所示。

2.2 ANR日志采集流程

日志采集流程图如下:

主要流程如下:

1.上面讲到,appNotResponding方法负责ANR信息的采集和无响应弹框的显示,所以入口也是这个方法。

2.首先,该方法中会统计出现ANR的进程以及最近使用到的进程,因为最近使用到的进程的CPU占用会有可能会导致其它进程出现问题。其中包括java进程集合firstPids和native进程集合nativePids。然后通知到AMS中的dumpStackTraces方法来完成具体的ANR信息采集工作。

3.dumpStackTraces方法中,首先会休眠200ms,完成CPU使用状态的采集。其中占用率较高的几个进程会被加入到extraPids集合,也会被进行相应的采集操作。

4.最后通过重名方法dumpStackTraces方法,完成具体的采集操作。该方法中,首先会遍历java进程,然后遍历native进程,最后遍历extraPids集合,分别完成具体的进程采集操作。具体采集的信息在流程中,这里就不再向西赘述了。

2.3 ANR窗口弹出流程

弹出无响应窗口的流程则简单的多,通过handler发一个延时5S的消息,转交到主线程完成显示。

            if (mService.mUiHandler != null) {
                // Bring up the infamous App Not Responding dialog
                Message msg = Message.obtain();
                msg.what = ActivityManagerService.SHOW_NOT_RESPONDING_UI_MSG;
                msg.obj = new AppNotRespondingDialog.Data(mApp, aInfo, aboveSystem);
                mService.mUiHandler.sendMessageDelayed(msg, anrDialogDelayMs);
            }

整个流程如下图所示,一样,这里就不再展开了:

三.ANR的日志和文件解读

发生ANR的时候,会有两种形式的记录:

1.logcat中记录的日志。

2.data/anr/目录下生成对应的ANR日志文件。

3.1 logcat中记录的信息

首先,我们先梳理一下整个logcat中ANR日志的完整流程:

2022-10-08 15:03:32.023 24194-24330/? W/InputDispatcher: Window ... is not responding. Waited 5007ms for MotionEvent    //InputDispatcher::onAnrLocked中,识别到输入事件响应超时
2022-10-08 15:03:32.023 24194-24330/? W/InputDispatcher: Canceling events ... it is unresponsive                        //InputDispatcher::cancelEventsForAnrLocked中,InputDispatcher进行ANR流程分法
2022-10-08 15:03:32.023 24194-24330/? I/WindowManager: ANR in Window{}                                                  //AnrController.java的notifyWindowUnresponsive中,window中超时,准备通知AMS
2022-10-08 15:03:32.146 24194-31707/? I/ActivityManager: dumpStackTraces pids={7705=true, 7740=true, ...}               //ActivityManagerService.java的dumpStackTraces中,准备收集的进程集合
2022-10-08 15:03:32.471 24194-31707/? I/ActivityManager: Dumping to /data/anr/anr_2022-10-08-15-03-32-471               //ActivityManagerService.java的dumpStackTraces中,准备通知APP收集ANR日志
2022-10-08 15:03:32.471 24194-31707/? I/ActivityManager: Collecting stacks for pid 31655                                //ActivityManagerService.java的dumpStackTraces中,待收集的进程PID
2022-10-08 15:03:32.472 24194-31707/? I/system_server: libdebuggerd_client: started dumping process 31655               //debuggerd_client.cpp的debuggerd_trigger_dump中,native中准备通知待收集的进程
2022-10-08 15:03:32.472 31655-31660/com.xt.client I/com.xt.client: Thread[6,tid=31660,...]: reacting to signal 3        //APP进程收到信号,准备开始写入
2022-10-08 15:03:32.568 31655-31660/com.xt.client I/com.xt.client: Wrote stack traces to tombstoned                     //APP进程写入完成,通知系统侧
2022-10-08 15:03:32.568 24194-31707/? I/system_server: libdebuggerd_client: done dumping process 31655                  //系统进程收到通知,进行保存
2022-10-08 15:03:32.589 24194-31707/? I/ActivityManager: Collecting stacks for pid 24194                                //ActivityManagerService.java dumpStackTraces中,准备开始收集下一个进程的PID,这个进程为系统进程system_server
2022-10-08 15:03:32.601 24194-31707/? I/system_server: libdebuggerd_client: started dumping process 24194               //debuggerd_client.cpp的debuggerd_trigger_dump中,native中准备通知待收集的进程
2022-10-08 15:03:32.603 24194-24202/? I/system_server: Thread[2,...]: reacting to signal 3                              //system_server进程收到信号,准备开始写入
2022-10-08 15:03:32.858 24194-24202/? I/system_server: Wrote stack traces to tombstoned                                 //system_server进程写入完成,通知系统侧
...继续收集其它java进程的信息
2022-10-08 15:03:34.972 24194-31707/? I/ActivityManager: Collecting stacks for native pid 660                           //准备收集native进程660的信息
2022-10-08 15:03:34.973 24194-31707/? I/system_server: libdebuggerd_client: started dumping process 660                 //debuggerd_client.cpp的debuggerd_trigger_dump中,native中准备通知待收集的进程
2022-10-08 15:03:34.974 660-660/? I/libc: Requested dump for pid 660 (binder:660_2)                                     //debuggerd_handler.cpp的log_signal_summary中,native中准备通知待收集的进程
...继续收集其它native进程的信息
2022-10-08 15:03:36.273 24194-31707/? I/ActivityManager: Done dumping                                                   //全部收集完成
2022-10-08 15:03:36.274 24194-31707/? E/ActivityManager: ANR in com.xt.client (com.xt.client/.MainActivity)             //打印发生ANR时收集到的日志
...

上面其实对整个流程已经做了一个讲解了,这里稍微赘述一下。

1.InputDispatcher中识别到发生ANR了,通过回调java方法,通知到InputManagerService。如InputDispatcher开头的两行日志所记录。

2.InputManagerService中,根据WindowState找到对应的归属,先交给ActivityRecord处理,最终转交到AMS中。如WindowManager开头的一行日志所记录。

3.AMS中委托给AnrHelper.java进行处理,最终会通知到ProcessErrorStateRecord.java的appNotResponding方法,该方法中,会进行三个操作:

首先,收集需要dump的进程列表,包含java进程,native进程,其它进程,采集规则是发生ANR的进程+最近活动的进程。

其次,进行采集操作。这个会交给ActivityManagerService.java的dumpStackTraces方法执行。

最后,显示无响应的弹框。

4.采集日志会进入到ActivityManagerService.java的dumpStackTraces方法中,此时先打印ActivityManager开头的两行日志,分别代表着开始采集和生成日志文件。下面这两个日志分别记录将要采集的进程ID以及日志文件路径:

2022-10-08 15:03:32.146 24194-31707/? I/ActivityManager: dumpStackTraces pids={7705=true, 7740=true, ...}               //ActivityManagerService.java的dumpStackTraces中,准备收集的进程集合
2022-10-08 15:03:32.471 24194-31707/? I/ActivityManager: Dumping to /data/anr/anr_2022-10-08-15-03-32-471               //ActivityManagerService.java的dumpStackTraces中,准备通知APP收集ANR日志

5.因为需要dump多个进程的,所以开始采集以及采集完成的日志会有很多,如下面所示,分别代表开始采集,发送信号通知,收到信号通知,采集完成以及系统收到采集完成的通知。这种日志会经历很多轮。

2022-10-08 15:03:32.471 24194-31707/? I/ActivityManager: Collecting stacks for pid 31655                                //ActivityManagerService.java的dumpStackTraces中,待收集的进程PID
2022-10-08 15:03:32.472 24194-31707/? I/system_server: libdebuggerd_client: started dumping process 31655               //debuggerd_client.cpp的debuggerd_trigger_dump中,native中准备通知待收集的进程
2022-10-08 15:03:32.472 31655-31660/com.xt.client I/com.xt.client: Thread[6,tid=31660,...]: reacting to signal 3        //APP进程收到信号,准备开始写入
2022-10-08 15:03:32.568 31655-31660/com.xt.client I/com.xt.client: Wrote stack traces to tombstoned                     //APP进程写入完成,通知系统侧
2022-10-08 15:03:32.568 24194-31707/? I/system_server: libdebuggerd_client: done dumping process 31655                  //系统进程收到通知,进行保存

6.采集native的信息

2022-10-08 15:03:34.972 24194-31707/? I/ActivityManager: Collecting stacks for native pid 660                           //准备收集native进程660的信息
2022-10-08 15:03:34.973 24194-31707/? I/system_server: libdebuggerd_client: started dumping process 660                 //debuggerd_client.cpp的debuggerd_trigger_dump中,native中准备通知待收集的进程
2022-10-08 15:03:34.974 660-660/? I/libc: Requested dump for pid 660 (binder:660_2)                                     //debuggerd_handler.cpp的log_signal_summary中,native中准备通知待收集的进程

7.所有进程dump完成。

2022-10-08 15:03:36.273 24194-31707/? I/ActivityManager: Done dumping                                                   //全部收集完成

8.打印之前采集的CPU的运行信息:

2022-10-08 15:03:36.274 24194-31707/? E/ActivityManager: ANR in com.xt.client (com.xt.client/.MainActivity)
    PID: 31655    //进程ID
    Reason: Input dispatching timed out (dff7ad com.xt.client/com.xt.client.MainActivity (server) is not responding. Waited 5007ms for MotionEvent)    //原因
    Parent: com.xt.client/.MainActivity    //触发点
    ErrorId: bcef99f2-c232-4bee-bdd8-233b4ff598f0
    Frozen: false
    Load: 0.54 / 0.15 / 0.12        //ANR发生之前的1分钟,5分钟,15分钟CPU占用率。0.54代表0.54%
    ----- Output from /proc/pressure/memory -----
    some avg10=0.00 avg60=0.00 avg300=0.00 total=18377466
    full avg10=0.00 avg60=0.00 avg300=0.00 total=10458714
    ----- End output from /proc/pressure/memory -----
    //ANR发生之前的129秒内CPU占用率
    CPU usage from 129271ms to 0ms ago (2022-10-08 15:01:22.846 to 2022-10-08 15:03:32.116):
      //user代表用户态占用,kernel代表内核态占用
      1.7% 24194/system_server: 1.2% user + 0.5% kernel / faults: 3611 minor
      1.5% 689/surfaceflinger: 1.1% user + 0.4% kernel / faults: 441 minor
      1.4% 24426/com.android.systemui: 1% user + 0.3% kernel / faults: 2682 minor
      1% 28207/kworker/u16:6: 0% user + 1% kernel / faults: 18 minor
      0.8% 31505/kworker/u16:3: 0% user + 0.8% kernel
      0.7% 27789/kworker/u16:1: 0% user + 0.7% kernel / faults: 2 minor
      0.6% 29978/com.tencent.mm: 0.1% user + 0.4% kernel / faults: 66 minor
      0.6% 1122/vendor.google.wifi_ext@1.0-service-vendor: 0.3% user + 0.2% kernel / faults: 1190 minor
      0.3% 24523/com.breel.wallpapers20a: 0.2% user + 0% kernel / faults: 1291 minor
      0.4% 24828/com.google.android.apps.nexuslauncher: 0.3% user + 0% kernel / faults: 1984 minor
      0.1% 691/android.hardware.graphics.composer@2.4-service-sm8150: 0% user + 0% kernel / faults: 133 minor
      0.3% 18902/com.google.android.gms.persistent: 0.2% user + 0.1% kernel / faults: 2221 minor
      0.3% 30388/com.tencent.mm:tools: 0.2% user + 0% kernel / faults: 93 minor
      ...

具体解释如下:

行数内容解释
1
ANR in com.test.sceneengineservice
发生ANR应用的包名
2
PID: 2468
发生ANR的进程ID,请注意, 此日志是system_server进程打印的
3
Reason: executing service com.test.sceneengineservice/.SceneEngineService 
发生ANR的的原因
4
Load: 0.0 / 0.0 / 0.0 
发生ANR之前CPU的负载,分别代表发生前的1分钟/5分钟/15分钟
5
CPU usage from 14781ms to 0ms ago (2022-09-21 07:32:34.092 to 2022-09-21 07:32:48.873):
这期间的CPU占用。从发生之前的14秒到发生时
6
E ActivityManager:   5.6% 1000/system_server: 1.8% user + 3.7% kernel / faults: 3555 minor
这段时间内具体的CPU占用率。user代表应用侧的占用,kernel代表内核侧的占用。

3.2  ANR文件中记录的信息

文件内容信息如下:

Subject: Input dispatching timed out (4dff4f9 com.xt.client/com.xt.client.MainActivity (server) is not responding. Waited 5001ms for MotionEvent)

--- CriticalEventLog ---
capacity: 20
events <
  timestamp_ms: 1665198570560
  anr <
    pid: 20001
    process: "com.xt.client"
    process_class: 1
    subject: "Input dispatching timed out (b9701e com.xt.client/com.xt.client.MainActivity (server) is not responding. Waited 5007ms for MotionEvent)"
    uid: 10244
  >
>
events <
  timestamp_ms: 1665198580245
  java_crash <
    exception_class: "java.lang.SecurityException"
    pid: 20001
    process: "com.xt.client"
    process_class: 1
    uid: 10244
  >
>
timestamp_ms: 1665198615793
window_ms: 300000


----- pid 20223 at 2022-10-08 11:10:16.144897760+0800 -----
Cmd line: com.xt.client
Build fingerprint: 'google/sunfish/sunfish:13/TP1A.220905.004/8927612:user/release-keys'
ABI: 'arm64'
Build type: optimized
/** 类加载信息-开始 class_linker.cc中写入,记录类加载信息 */
Zygote loaded classes=21523 post zygote classes=573
Dumping registered class loaders
#0 dalvik.system.PathClassLoader: [], parent #1
#1 java.lang.BootClassLoader: [], no parent
#2 dalvik.system.PathClassLoader: [/data/app/~~9kDnfQhhAsXX5wlTVjeRlA==/com.xt.client-DT6oy9kys_nok1ibzgGmFQ==/base.apk!classes12.dex:/data/data/com.xt.client/code_cache/.overlay/base.apk/classes14.dex:/data/app/~~9kDnfQhhAsXX5wlTVjeRlA==/com.xt.client-DT6oy9kys_nok1ibzgGmFQ==/base.apk:/data/app/~~9kDnfQhhAsXX5wlTVjeRlA==/com.xt.client-DT6oy9kys_nok1ibzgGmFQ==/base.apk!classes11.dex:/data/app/~~9kDnfQhhAsXX5wlTVjeRlA==/com.xt.client-DT6oy9kys_nok1ibzgGmFQ==/base.apk!classes3.dex:/data/app/~~9kDnfQhhAsXX5wlTVjeRlA==/com.xt.client-DT6oy9kys_nok1ibzgGmFQ==/base.apk!classes8.dex:/data/data/com.xt.client/code_cache/.overlay/base.apk/classes9.dex:/data/app/~~9kDnfQhhAsXX5wlTVjeRlA==/com.xt.client-DT6oy9kys_nok1ibzgGmFQ==/base.apk!classes5.dex:/data/app/~~9kDnfQhhAsXX5wlTVjeRlA==/com.xt.client-DT6oy9kys_nok1ibzgGmFQ==/base.apk!classes6.dex], parent #1
Done dumping class loaders
Classes initialized: 0 in 0
/** 类加载信息-结束 */

/** 引用表-开始 intern_table.cc中写入,记录强弱引用数量 */
Intern table: 30098 strong; 1210 weak
/** 引用表-结束 */

/** JNI加载信息-开始 java_vm_ext.cc中写入,记录加载的so */
JNI: CheckJNI is on; globals=400 (plus 68 weak)
Libraries: libandroid.so libaudioeffect_jni.so libcompiler_rt.so libframework-connectivity-jni.so libframework-connectivity-tiramisu-jni.so libicu_jni.so libjavacore.so libjavacrypto.so libjnigraphics.so libmedia_jni.so libopenjdk.so librs_jni.so librtp_jni.so libsoundpool.so libstats_jni.so libwebviewchromium_loader.so (16)
/** JNI加载信息-结束  */

/** java虚拟机堆状态-开始 heap.cc中写入,记录当前的内存状态 */
Heap: 74% free, 4180KB/16MB; 183984 objects
Dumping cumulative Gc timings
Average major GC reclaim bytes ratio inf over 0 GC cycles
Average major GC copied live bytes ratio 0.732525 over 4 major GCs
Cumulative bytes moved 29923016
Cumulative objects moved 547043
Peak regions allocated 60 (15MB) / 1024 (256MB)
Total madvise time 7.031ms
Average minor GC reclaim bytes ratio nan over 0 GC cycles
Average minor GC copied live bytes ratio nan over 0 minor GCs
Cumulative bytes moved 0
Cumulative objects moved 0
Peak regions allocated 60 (15MB) / 1024 (256MB)
Total number of allocations 183984
Total bytes allocated 4180KB
Total bytes freed 0B
Free memory 11MB
Free memory until GC 11MB
Free memory until OOME 251MB
Total memory 16MB
Max memory 256MB
Zygote space size 7760KB
Total mutator paused time: 0
Total time waiting for GC to complete: 2.448us
Total GC count: 0
Total GC time: 0
Total blocking GC count: 0
Total blocking GC time: 0
Total pre-OOME GC count: 0
Native bytes total: 14386550 registered: 2142838
Total native bytes at last GC: 4357592
/** java虚拟机堆状态-结束 */

/** OTA文件记录-开始 oat_file_manager.cc中写入,记录当前使用到的OTA文件 */
/data/app/~~9kDnfQhhAsXX5wlTVjeRlA==/com.xt.client-DT6oy9kys_nok1ibzgGmFQ==/oat/arm64/base.vdex: verify
/system/framework/oat/arm64/android.hidl.manager-V1.0-java.odex: verify
/system/framework/oat/arm64/android.test.base.odex: verify
/system/framework/oat/arm64/android.hidl.base-V1.0-java.odex: verify
/** OTA文件记录-结束 */

/** JIT信息记录-开始 jit.cc中写入,记录JIT的信息 */
Current JIT code cache size (used / resident): 40KB / 60KB
Current JIT data cache size (used / resident): 28KB / 60KB
Zygote JIT code cache size (at point of fork): 22KB / 32KB
Zygote JIT data cache size (at point of fork): 16KB / 32KB
Current JIT mini-debug-info size: 19KB
Current JIT capacity: 128KB
Current number of JIT JNI stub entries: 0
Current number of JIT code cache entries: 46
Total number of JIT baseline compilations: 34
Total number of JIT optimized compilations: 4
Total number of JIT compilations for on stack replacement: 1
Total number of JIT code cache collections: 1
Memory used for stack maps: Avg: 377B Max: 11KB Min: 32B
Memory used for compiled code: Avg: 1156B Max: 26KB Min: 164B
Memory used for profiling info: Avg: 187B Max: 4416B Min: 24B
Start Dumping Averages for 56 iterations for JIT timings
Compiling optimized:	Sum: 49.228ms Avg: 879.071us
Compiling baseline:	Sum: 27.108ms Avg: 484.071us
TrimMaps:	Sum: 2.699ms Avg: 48.196us
Code cache collection:	Sum: 380us Avg: 6.785us
Compiling OSR:	Sum: 240us Avg: 4.285us
Done Dumping Averages
Memory used for compilation: Avg: 154KB Max: 4050KB Min: 16KB
ProfileSaver total_bytes_written=0
ProfileSaver total_number_of_writes=0
ProfileSaver total_number_of_code_cache_queries=0
ProfileSaver total_number_of_skipped_writes=0
ProfileSaver total_number_of_failed_writes=0
ProfileSaver total_ms_of_sleep=5000
ProfileSaver total_ms_of_work=0
ProfileSaver total_number_of_hot_spikes=0
ProfileSaver total_number_of_wake_ups=0
/** JIT信息记录-结束 */

/** ART信息指标-开始 metrics_common.cc中写入,记录ART虚拟机的一些关键指标 */
*** ART internal metrics ***
  Metadata:
    timestamp_since_start_ms: 19952
  Metrics:
    ClassLoadingTotalTime: count = 17358
    ClassVerificationTotalTime: count = 15303
    ClassVerificationCount: count = 28
    WorldStopTimeDuringGCAvg: count = 0
    YoungGcCount: count = 0
    FullGcCount: count = 0
    TotalBytesAllocated: count = 2349168
    TotalGcCollectionTime: count = 0
    YoungGcThroughputAvg: count = 0
    FullGcThroughputAvg: count = 0
    YoungGcTracingThroughputAvg: count = 0
    FullGcTracingThroughputAvg: count = 0
    JitMethodCompileTotalTime: count = 28492
    JitMethodCompileCount: count = 39
    YoungGcCollectionTime: range = 0...60000, buckets: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    FullGcCollectionTime: range = 0...60000, buckets: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    YoungGcThroughput: range = 0...10000, buckets: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    FullGcThroughput: range = 0...10000, buckets: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    YoungGcTracingThroughput: range = 0...10000, buckets: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    FullGcTracingThroughput: range = 0...10000, buckets: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
*** Done dumping ART internal metrics ***
/** ART信息指标-结束 */

/** 线程信息-开始 thread_list.cc中写入,记录当前进程的线程信息 */
suspend all histogram:	Sum: 112us 99% C.I. 0.083us-54us Avg: 4.666us Max: 54us
DALVIK THREADS (19):
"Signal Catcher" daemon prio=10 tid=6 Runnable
  | group="system" sCount=0 ucsCount=0 flags=0 obj=0x12c801a0 self=0xb400007193e16a60
  | sysTid=20229 nice=-20 cgrp=system sched=0/0 handle=0x7004dfbcb0
  | state=R schedstat=( 17930731 0 5 ) utm=0 stm=0 core=7 HZ=100
  | stack=0x7004d04000-0x7004d06000 stackSize=991KB
  | held mutexes= "mutator lock"(shared held)
  native: #00 pc 00000000005374cc  /apex/com.android.art/lib64/libart.so (art::DumpNativeStack(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, int, BacktraceMap*, char const*, art::ArtMethod*, void*, bool)+128) (BuildId: 97fdb979efb7d2b596fa4fceabaad95b)
  native: #01 pc 00000000006f0d94  /apex/com.android.art/lib64/libart.so (art::Thread::DumpStack(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, bool, BacktraceMap*, bool) const+236) (BuildId: 97fdb979efb7d2b596fa4fceabaad95b)
  native: #02 pc 00000000006fe620  /apex/com.android.art/lib64/libart.so (art::DumpCheckpoint::Run(art::Thread*)+208) (BuildId: 97fdb979efb7d2b596fa4fceabaad95b)
  native: #03 pc 00000000003641d4  /apex/com.android.art/lib64/libart.so (art::ThreadList::RunCheckpoint(art::Closure*, art::Closure*)+440) (BuildId: 97fdb979efb7d2b596fa4fceabaad95b)
  native: #04 pc 00000000006fcdc0  /apex/com.android.art/lib64/libart.so (art::ThreadList::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, bool)+280) (BuildId: 97fdb979efb7d2b596fa4fceabaad95b)
  native: #05 pc 00000000006fc7b4  /apex/com.android.art/lib64/libart.so (art::ThreadList::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits<char> >&)+292) (BuildId: 97fdb979efb7d2b596fa4fceabaad95b)
  native: #06 pc 00000000006d57e4  /apex/com.android.art/lib64/libart.so (art::Runtime::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits<char> >&)+184) (BuildId: 97fdb979efb7d2b596fa4fceabaad95b)
  native: #07 pc 00000000006e1928  /apex/com.android.art/lib64/libart.so (art::SignalCatcher::HandleSigQuit()+468) (BuildId: 97fdb979efb7d2b596fa4fceabaad95b)
  native: #08 pc 0000000000573c40  /apex/com.android.art/lib64/libart.so (art::SignalCatcher::Run(void*)+264) (BuildId: 97fdb979efb7d2b596fa4fceabaad95b)
  native: #09 pc 00000000000b62b8  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+208) (BuildId: 058e3ec96fa600fb840a6a6956c6b64e)
  native: #10 pc 0000000000052fb8  /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+64) (BuildId: 058e3ec96fa600fb840a6a6956c6b64e)
  (no managed stack frames)

"main" prio=5 tid=1 Sleeping
  | group="main" sCount=1 ucsCount=0 flags=1 obj=0x72424e88 self=0xb400007193e0a7b0
  | sysTid=20223 nice=-10 cgrp=system sched=0/0 handle=0x72dd5244f8
  | state=S schedstat=( 331627619 74690560 345 ) utm=30 stm=2 core=7 HZ=100
  | stack=0x7fcfb32000-0x7fcfb34000 stackSize=8188KB
  | held mutexes=
  at java.lang.Thread.sleep(Native method)
  - sleeping on <0x0e92c80b> (a java.lang.Object)
  at java.lang.Thread.sleep(Thread.java:450)
  - locked <0x0e92c80b> (a java.lang.Object)
  at java.lang.Thread.sleep(Thread.java:355)
  at com.xt.client.MainActivity.doActionWithoutClass(MainActivity.java:201)
  at com.xt.client.MainActivity.access$000(MainActivity.java:85)
  at com.xt.client.MainActivity$2.onItemClick(MainActivity.java:177)
  at com.xt.client.inter.RecyclerItemClickListener.onInterceptTouchEvent(RecyclerItemClickListener.java:34)
  at androidx.recyclerview.widget.RecyclerView.findInterceptingOnItemTouchListener(RecyclerView.java:3286)
  at androidx.recyclerview.widget.RecyclerView.dispatchToOnItemTouchListeners(RecyclerView.java:3257)
  at androidx.recyclerview.widget.RecyclerView.onTouchEvent(RecyclerView.java:3421)
  at android.view.View.dispatchTouchEvent(View.java:15004)
  at android.view.ViewGroup.dispatchTransformedTouchEvent(ViewGroup.java:3115)
  at android.view.ViewGroup.dispatchTouchEvent(ViewGroup.java:2788)
  at android.view.ViewGroup.dispatchTransformedTouchEvent(ViewGroup.java:3121)
  at android.view.ViewGroup.dispatchTouchEvent(ViewGroup.java:2802)
  at android.view.ViewGroup.dispatchTransformedTouchEvent(ViewGroup.java:3121)
  at android.view.ViewGroup.dispatchTouchEvent(ViewGroup.java:2802)
  at android.view.ViewGroup.dispatchTransformedTouchEvent(ViewGroup.java:3121)
  at android.view.ViewGroup.dispatchTouchEvent(ViewGroup.java:2802)
  at android.view.ViewGroup.dispatchTransformedTouchEvent(ViewGroup.java:3121)
  at android.view.ViewGroup.dispatchTouchEvent(ViewGroup.java:2802)
  at com.android.internal.policy.DecorView.superDispatchTouchEvent(DecorView.java:498)
  at com.android.internal.policy.PhoneWindow.superDispatchTouchEvent(PhoneWindow.java:1899)
  at android.app.Activity.dispatchTouchEvent(Activity.java:4262)
  at com.android.internal.policy.DecorView.dispatchTouchEvent(DecorView.java:456)
  at android.view.View.dispatchPointerEvent(View.java:15263)
  at android.view.ViewRootImpl$ViewPostImeInputStage.processPointerEvent(ViewRootImpl.java:6548)
  at android.view.ViewRootImpl$ViewPostImeInputStage.onProcess(ViewRootImpl.java:6348)
  at android.view.ViewRootImpl$InputStage.deliver(ViewRootImpl.java:5804)
  at android.view.ViewRootImpl$InputStage.onDeliverToNext(ViewRootImpl.java:5861)
  at android.view.ViewRootImpl$InputStage.forward(ViewRootImpl.java:5827)
  at android.view.ViewRootImpl$AsyncInputStage.forward(ViewRootImpl.java:5992)
  at android.view.ViewRootImpl$InputStage.apply(ViewRootImpl.java:5835)
  at android.view.ViewRootImpl$AsyncInputStage.apply(ViewRootImpl.java:6049)
  at android.view.ViewRootImpl$InputStage.deliver(ViewRootImpl.java:5808)
  at android.view.ViewRootImpl$InputStage.onDeliverToNext(ViewRootImpl.java:5861)
  at android.view.ViewRootImpl$InputStage.forward(ViewRootImpl.java:5827)
  at android.view.ViewRootImpl$InputStage.apply(ViewRootImpl.java:5835)
  at android.view.ViewRootImpl$InputStage.deliver(ViewRootImpl.java:5808)
  at android.view.ViewRootImpl.deliverInputEvent(ViewRootImpl.java:8857)
  at android.view.ViewRootImpl.doProcessInputEvents(ViewRootImpl.java:8808)
  at android.view.ViewRootImpl.enqueueInputEvent(ViewRootImpl.java:8777)
  at android.view.ViewRootImpl$WindowInputEventReceiver.onInputEvent(ViewRootImpl.java:8980)
  at android.view.InputEventReceiver.dispatchInputEvent(InputEventReceiver.java:267)
  at android.os.MessageQueue.nativePollOnce(Native method)
  at android.os.MessageQueue.next(MessageQueue.java:335)
  at android.os.Looper.loopOnce(Looper.java:161)
  at android.os.Looper.loop(Looper.java:288)
  at android.app.ActivityThread.main(ActivityThread.java:7898)
  at java.lang.reflect.Method.invoke(Native method)
  at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:548)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:936)

  ...
/** 线程信息-结束/

----- end 20223 -----

...

具体解释如下:

所在行内容解释
1”main“线程名
1prio=5线程优先级
1tid=1线程标识符
1Native线程状态,native是native中的线程状态,对应java中runnable
2group="main"线程所属的线程组
2sCount=1程被正常挂起的次数
2dsCount=0线程因调试而挂起次数
2flags=1
2obj=0x75c4db98线程的Java对象的地址
2self=0x7b08e14c00线程本身的地址
3sysTid=2468Linux下的内核线程id
3nice=0线程的调度优先级
3cgrp=default调度属组
3sched=0/0线程的调度策略和优先级
3handle=0x7b8e933548线程的处理函数地址
4state=S调度状态
4schedstat=( 562066147 3061697256 542 )

执行时间记录:

三个值分别表示线程在cpu上执行的时间、线程的等待时间和线程执行的时间片长度

4utm=31用户态CPU占用次数,由于频率是100,则占用31次就等于31*10ms=310ms
4stm=24内核态CPU占用次数
4core=2最后执行这个线程的CPU核序号
4HZ=100运行频率,一秒100次
5stack=0x7fe841d000-0x7fe841f000虚拟机栈地址
5stackSize=8MB虚拟机栈大小
6at android.os.BinderProxy.transactNative(Native method) at android.os.BinderProxy.transact(Binder.java:1127)此时采集到的APP对应线程的对应状态

四.扩展问题

问题1:ANR发生后会立马弹出ANR框吗?

答:不会,2.3中讲到,触发了ANR并且采集完相关线程日志后,会延时5S在弹出ANR的框。

问题2:主线程卡顿一定会导致ANR吗?

答:不会。首先,卡顿一定要达到某个阈值才会触发。其次,一定要满足某个场景才可以。比如ActivityA跳转ActivityB之后,延时1S后主线程休眠20S,就不会ANR,因为输入事件已经得到响应,并不满足任何触发ANR的场景。

问题3:为什么ANR的日志traces.txt中,有时候会打印nativePollOnce的堆栈?

答:理论上,系统侧在发现达到超市事件后,会立马出发ANR检查。如果检查发生真的超时,会走ANR流程,捕获APP的堆栈状态并保存。但是实际执行中,创建文件,dump指定进程堆栈信息都需要时间,并且ANR流程的开始还会sleep200毫秒。

我们看3.1中日志,从15:03:32.023开始识别到ANR,到15:03:32.568结束dump完成APP侧信息。

2022-10-08 15:03:32.023 24194-24330/? W/InputDispatcher: Window ... is not responding. Waited 5007ms for MotionEvent    //InputDispatcher::onAnrLocked中,识别到输入事件响应超时
2022-10-08 15:03:32.568 31655-31660/com.xt.client I/com.xt.client: Wrote stack traces to tombstoned                     //APP进程写入完成,通知系统侧

整个流程大约要500多毫秒,其中包括:

1.InputDispatcher中识别到ANR,传递到AMS准备开始采集要100多毫秒

2.识别到ANR的时候,系统进程会休眠200毫秒采集CPU信息。

3.系统进程采集一些必要信息,会耗费一些时间,大约100毫秒左右。

4.客户端收到信号去采集ANR的时候,也需要时间,大约要100毫秒左右。

所以,漂移的是有上限时间的,正常漂移上限一般为500ms左右,即使极端的场景,应该也不会很长。

问题4:ANR日志显示nativePollOnce时如何排查?

答:根据上面那道题的答案可知,漂移时间有上限。所以我们记录每一个主线程任务的名称和其执行时间,形成一张时序图,如下图所示(借用字节的文章中的图)。

51时发生ANR,从这个点开始,往前找,一定会有一个耗时很长或者一系列耗时不长但很密集的任务。当然,前提是要有这样的记录工具。

问题5:上面提到的记录工具如何实现?

答:具体实现肯定要考虑方方面面,而具体的实现方式肯定也有不同,比如matrix利用的就是字节码插桩统计每一个方法的执行时间。这里讲一个最基本也是最简单的实现方式:

1.注册主线程Looper的回调,Looper.setMessageLogging()

2.回调方法中,两次回调一定是一次开始,一次结束,成对出现。两次回调之间的时间差,就是主线程任务的执行时间。

3.开启一个线程,定时观察主线程的任务是否正常结束,如果超过指定阈值时间,则可以主动每隔一段时间去捕获主线程堆栈,避免埋点漂移。(当然这种方式对于CPU过载导致的没有作用,但是CPU过载可以通过系统日志辅助排查,大多数场景其实都不属于CPU过载)。

4.最后任务结束或者超出设置好上限时,对捕获的主线程堆栈进行分析。

如果多次堆栈方法完全一致,则说明有可能就是那个堆栈中的最后一级方法耗时导致的,比如加锁导致的ANR。

即使有可能是不同的堆栈,但有了这些对栈,对我们排查整个卡顿流程也是很有帮助的,比如复杂布局渲染导致的ANR。

问题6:setOnClickListener中休眠20秒会不会引发ANR?

具体测试代码如下:

        findViewById(R.id.text2).setOnClickListener(v -> {
            try {
                Thread.sleep(20_000);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        });

如果你实际验证的话,你会发现只点击一次的话,不会发现ANR,如果连续点击两次,则会发生ANR。是不是与正常的认知不符?其实原理也很简单,View的onTouchEvent方法中,对click时间的调用方法是如下代码所示的:

if (!post(mPerformClick)) {
    performClickInternal();
}

mPerformClick = new PerformClick();

 private final class PerformClick implements Runnable {
        @Override
        public void run() {
            recordGestureClassification(TOUCH_GESTURE_CLASSIFIED__CLASSIFICATION__SINGLE_TAP);
            performClickInternal();
        }
    }

也就是说,最后的点击处理,是进程了一次线程切换的,虽然都是执行在主线程,但是由于进行了线程切换,所以原来的分发流程会正常执行完成并发出流程结束的信号给系统侧。而sleep事件是后续在主线程处理点击事件时阻塞的,所以并不会阻塞分发流程,因为错不会产生ANR。

当然,再点击一次,由于主线程阻塞无法进行事件分发,则会造成ANR。

五.声明和参考

写在最后

本文重注的是整个ANR流程的讲解,所以对ANR排查这块讲的很粗略,因为这一块,字节和VIVO已经有很好的文章进行了描述,如下:

今日头条 ANR 优化实践系列分享 - 实例剖析集锦

干货:ANR日志分析全面解析_vivo互联网技术的博客-CSDN博客_anr日志分析

本文参考文章链接:

android源码-事件分发处理机制(下)-从信号源输入到处理完成的完整源码解读

Android卡顿掉帧问题分析之原理篇(努比亚技术团队)

今日头条 ANR 优化实践系列 - 设计原理及影响因素

Logo

为开发者提供学习成长、分享交流、生态实践、资源工具等服务,帮助开发者快速成长。

更多推荐