我在之前的文章中已经创建了一些关于如何创建一个 Watcher。请参阅文章 “Elastic:菜鸟上手指南” 里的 “通知及警报” 部分。在其中的一篇文章 “Elastic:创建你的第一个Elastic watcher” 中,可能会涉及到两个机器的配置,不编译实验。在今天的文章中,我将使用一台机器来进行展示。

在这里我必须指出的是:Watchers 和 Alerts 都可以发出警报,但是 Watchers 是在 Elasticsearch 中运行的,而 Alerts 是在 Kibana 里运行的。 

在进行下面的练习之前,你必须打开试用许可:

关于如何订阅许可,请参阅 Elastic 的官方网站 https://www.elastic.co/cn/subscriptions 以及 License management.

Watcher API

在今天的练习中,我们将使用 Watcher API 来进行监控。请大家参考 Elastic 的官方文档https://www.elastic.co/guide/en/elasticsearch/reference/current/watcher-api.html。简单地说,为了配置一个 Watcher,我们必须配置如下的几个:

PUT _watcher/watch/log_error_watch
{
  "trigger": {},
  "input": {},
  "condition": {},
  "transform" {},
  "actions": {}
}

从上面的结构中可以看出来:一个 watcher 由5个部分组成,尽管 transform 在有些时候是可以不需要的。

在下面的描述中,我们将分别进行阐述:

trigger

这个定义多长时间 watcher 运行一次。比如我们可以定义如下:

  "trigger": {
    "schedule" : {
      "interval": "1m"
    }
  },

在上面,我们定以每隔 1 分钟运行一次。

input

watch input 获取你要评估的数据。要定期搜索日志数据并将结果加载到 watch 中,你可以使用 interval schedule 和 search input。 例如,以下 Watch 每 5 分钟在日志索引中搜索错误:

PUT _watcher/watch/log_error_watch
{
  "trigger": {
    "schedule": {
      "interval": "1m"
    }
  },
  "input": {
    "search": {
      "request": {
        "indices": [
          "logs"
        ],
        "body": {
          "query": {
            "bool": {
              "must": [
                {
                  "match": {
                    "message": "error"
                  }
                }
              ],
              "filter": [
                {
                  "range": {
                    "@timestamp": {
                      "from": "{{ctx.trigger.scheduled_time}}||-5m",
                      "to": "{{ctx.trigger.triggered_time}}"
                    }
                  }
                }
              ]
            }
          }
        }
      }
    }
  }
}

在上面,我们针对 logs 索引,在过去 5 分钟之内查询 messsage 字段里含有 error 的匹配。

对于上面的查询,它类似于我们的 DSL 查询:

GET logs/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "message": "error"
          }
        }
      ],
      "filter": [
        {
          "range": {
            "@timestamp": {
              "from": "now-1m",
              "to": "now"
            }
          }
        }
      ]
    }
  }
}

如果你查看观看历史记录,你会看到 watch 每 1 分钟触发一次。 但是,搜索不会返回任何结果,因此不会将任何内容加载到监视负载中。

例如,以下请求从观察历史中检索最近十次观察执行(观察记录):

GET .watcher-history*/_search?pretty
{
  "sort" : [
    { "result.execution_time" : "desc" }
  ]
}

上面会显示如下的的结果:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : ".watcher-history-13-2021.07.05",
        "_type" : "_doc",
        "_id" : "log_error_watch_bf5abb48-a012-43b9-9eae-b88796dbd334-2021-07-05T09:04:44.837331Z",
        "_score" : null,
        "_source" : {
          "watch_id" : "log_error_watch",
          "node" : "m9DhvMAuQy2F3c0XaoO2_A",
          "state" : "failed",
          "status" : {
            "state" : {
              "active" : true,
              "timestamp" : "2021-07-05T09:03:06.180Z"
            },
            "actions" : { },
            "execution_state" : "failed",
            "version" : -1
          },
          "trigger_event" : {
            "type" : "schedule",
            "triggered_time" : "2021-07-05T09:04:44.832Z",
            "schedule" : {
              "scheduled_time" : "2021-07-05T09:04:44.813Z"
            }
          },
          "input" : {
            "search" : {
              "request" : {
                "search_type" : "query_then_fetch",
                "indices" : [
                  "logs"
                ],
                "rest_total_hits_as_int" : true,
                "body" : {
                  "bool" : {
                    "must" : {
                      "match" : {
                        "message" : "error"
                      }
                    },
                    "filter" : {
                      "range" : {
                        "@timestamp" : {
                          "from" : "{{ctx.trigger.scheduled_time}}||-5m",
                          "to" : "{{ctx.trigger.triggered_time}}"
                        }
                      }
                    }
                  }
                }
              }
            }
          },
          "condition" : {
            "always" : { }
          },
          "result" : {
            "execution_time" : "2021-07-05T09:04:44.837Z",
            "execution_duration" : 30,
            "input" : {
              "type" : "search",
              "status" : "failure",
              "error" : {
                "root_cause" : [
                  {
                    "type" : "parsing_exception",
                    "reason" : "Unknown key for a START_OBJECT in [bool].",
                    "line" : 1,
                    "col" : 9
                  }
                ],
                "type" : "parsing_exception",
                "reason" : "Unknown key for a START_OBJECT in [bool].",
                "line" : 1,
                "col" : 9
              },
              "search" : {
                "request" : {
                  "search_type" : "query_then_fetch",
                  "indices" : [
                    "logs"
                  ],
                  "rest_total_hits_as_int" : true,
                  "body" : {
                    "bool" : {
                      "must" : {
                        "match" : {
                          "message" : "error"
                        }
                      },
                      "filter" : {
                        "range" : {
                          "@timestamp" : {
                            "from" : "2021-07-05T09:04:44.813Z||-5m",
                            "to" : "2021-07-05T09:04:44.832Z"
                          }
                        }
                      }
                    }
                  }
                }
              }
            },
            "actions" : [ ]
          },
          "messages" : [
            "failed to execute watch input"
          ]
        },
        "sort" : [
          1625475884837
        ]
      }
    ]
  }
}

condition

condition 评估你加载到 watch 中的数据并确定是否需要任何操作。 现在你已经将日志 error 加载到 watch 中,你可以定义一个条件来检查是否发现了任何 error。

例如,以下比较条件只是检查搜索输入是否返回了任何匹配项。

  "condition" : {
    "compare" : { "ctx.payload.hits.total" : { "gt" : 3 }} 
  }

在上面,它显示如果在过去的 5 分钟发现有超过 3 个的错误信息,那么将触发警报。我们完整的请求时这样的:

PUT _watcher/watch/log_error_watch
{
  "trigger": {
    "schedule": {
      "interval": "1m"
    }
  },
  "input": {
    "search": {
      "request": {
        "indices": [
          "logs"
        ],
        "body": {
          "query": {
            "bool": {
              "must": [
                {
                  "match": {
                    "message": "error"
                  }
                }
              ],
              "filter": [
                {
                  "range": {
                    "@timestamp": {
                      "from": "{{ctx.trigger.scheduled_time}}||-5m",
                      "to": "{{ctx.trigger.triggered_time}}"
                    }
                  }
                }
              ]
            }
          }
        }
      }
    }
  },
  "condition": {
    "compare": {
      "ctx.payload.hits.total": {
        "gt": 3
      }
    }
  }
}

你需要向包含错误的日志索引 logs 添加至少 3 个事件。 例如,以下请求向日志索引 logs 添加 404 错误。我们首先运行下面的一个 pipeline:

PUT _ingest/pipeline/add-timestamp
{
  "processors": [
    {
      "set": {
        "field": "@timestamp",
        "value": "{{_ingest.timestamp}}"
      }
    }
  ]
}

然后,运行下面的命令 4 次:

POST logs/event?pipeline=add-timestamp
{
  "request": "GET index.html",
  "status_code": 404,
  "message": "Error: File not found"
}

添加此事件后,watch 下次执行其条件时将评估为 true。 每次 watch 执行时,条件结果都会记录在 watch_record 中,所以可以通过查询 watch history 来验证是否满足条件:

GET .watcher-history*/_search?pretty
{
  "query" : {
    "bool" : {
      "must" : [
        { "match" : { "result.condition.met" : true }},
        { "range" : { "result.execution_time" : { "from" : "now-5m" }}}
      ]
    }
  }
}

经过 1 分钟后,我们使用上面的命令检查结果:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.287682,
    "hits" : [
      {
        "_index" : ".watcher-history-13-2021.07.05",
        "_type" : "_doc",
        "_id" : "log_error_watch_66abece2-8311-419b-a702-3d1b90d00d8e-2021-07-05T09:40:43.485371Z",
        "_score" : 1.287682,
        "_source" : {
          "watch_id" : "log_error_watch",
          "node" : "m9DhvMAuQy2F3c0XaoO2_A",
          "state" : "executed",
          "status" : {
            "state" : {
              "active" : true,
              "timestamp" : "2021-07-05T09:39:43.235Z"
            },
            "last_checked" : "2021-07-05T09:40:43.485Z",
            "last_met_condition" : "2021-07-05T09:40:43.485Z",
            "actions" : { },
            "execution_state" : "executed",
            "version" : -1
          },
          "trigger_event" : {
            "type" : "schedule",
            "triggered_time" : "2021-07-05T09:40:43.485Z",
            "schedule" : {
              "scheduled_time" : "2021-07-05T09:40:43.237Z"
            }
          },
          "input" : {
            "search" : {
              "request" : {
                "search_type" : "query_then_fetch",
                "indices" : [
                  "logs"
                ],
                "rest_total_hits_as_int" : true,
                "body" : {
                  "query" : {
                    "bool" : {
                      "must" : [
                        {
                          "match" : {
                            "message" : "error"
                          }
                        }
                      ],
                      "filter" : [
                        {
                          "range" : {
                            "@timestamp" : {
                              "from" : "{{ctx.trigger.scheduled_time}}||-5m",
                              "to" : "{{ctx.trigger.triggered_time}}"
                            }
                          }
                        }
                      ]
                    }
                  }
                }
              }
            }
          },
          "condition" : {
            "always" : { }
          },
          "result" : {
            "execution_time" : "2021-07-05T09:40:43.485Z",
            "execution_duration" : 419,
            "input" : {
              "type" : "search",
              "status" : "success",
              "payload" : {
                "_shards" : {
                  "total" : 1,
                  "failed" : 0,
                  "successful" : 1,
                  "skipped" : 0
                },
                "hits" : {
                  "hits" : [
                    {
                      "_index" : "logs",
                      "_type" : "event",
                      "_source" : {
                        "request" : "GET index.html",
                        "status_code" : 404,
                        "@timestamp" : "2021-07-05T09:38:40.860745Z",
                        "message" : "Error: File not found"
                      },
                      "_id" : "xVAIdnoBY7N_y24HuOdc",
                      "_score" : 0.018018503
                    },
                    {
                      "_index" : "logs",
                      "_type" : "event",
                      "_source" : {
                        "request" : "GET index.html",
                        "status_code" : 404,
                        "@timestamp" : "2021-07-05T09:38:41.052902Z",
                        "message" : "Error: File not found"
                      },
                      "_id" : "xlAIdnoBY7N_y24Huecd",
                      "_score" : 0.018018503
                    },
                    {
                      "_index" : "logs",
                      "_type" : "event",
                      "_source" : {
                        "request" : "GET index.html",
                        "status_code" : 404,
                        "@timestamp" : "2021-07-05T09:38:41.302771Z",
                        "message" : "Error: File not found"
                      },
                      "_id" : "x1AIdnoBY7N_y24HuucW",
                      "_score" : 0.018018503
                    },
                    {
                      "_index" : "logs",
                      "_type" : "event",
                      "_source" : {
                        "request" : "GET index.html",
                        "status_code" : 404,
                        "@timestamp" : "2021-07-05T09:38:41.636798Z",
                        "message" : "Error: File not found"
                      },
                      "_id" : "yFAIdnoBY7N_y24Hu-dk",
                      "_score" : 0.018018503
                    },
                    {
                      "_index" : "logs",
                      "_type" : "event",
                      "_source" : {
                        "request" : "GET index.html",
                        "status_code" : 404,
                        "@timestamp" : "2021-07-05T09:39:50.793076Z",
                        "message" : "Error: File not found"
                      },
                      "_id" : "yVAJdnoBY7N_y24HyeeJ",
                      "_score" : 0.018018503
                    },
                    {
                      "_index" : "logs",
                      "_type" : "event",
                      "_source" : {
                        "request" : "GET index.html",
                        "status_code" : 404,
                        "@timestamp" : "2021-07-05T09:39:51.718527Z",
                        "message" : "Error: File not found"
                      },
                      "_id" : "ylAJdnoBY7N_y24Hzecm",
                      "_score" : 0.018018503
                    },
                    {
                      "_index" : "logs",
                      "_type" : "event",
                      "_source" : {
                        "request" : "GET index.html",
                        "status_code" : 404,
                        "@timestamp" : "2021-07-05T09:39:52.277911Z",
                        "message" : "Error: File not found"
                      },
                      "_id" : "y1AJdnoBY7N_y24Hz-dW",
                      "_score" : 0.018018503
                    },
                    {
                      "_index" : "logs",
                      "_type" : "event",
                      "_source" : {
                        "request" : "GET index.html",
                        "status_code" : 404,
                        "@timestamp" : "2021-07-05T09:39:52.863406Z",
                        "message" : "Error: File not found"
                      },
                      "_id" : "zFAJdnoBY7N_y24H0eef",
                      "_score" : 0.018018503
                    }
                  ],
                  "total" : 8,
                  "max_score" : 0.018018503
                },
                "took" : 416,
                "timed_out" : false
              },
              "search" : {
                "request" : {
                  "search_type" : "query_then_fetch",
                  "indices" : [
                    "logs"
                  ],
                  "rest_total_hits_as_int" : true,
                  "body" : {
                    "query" : {
                      "bool" : {
                        "must" : [
                          {
                            "match" : {
                              "message" : "error"
                            }
                          }
                        ],
                        "filter" : [
                          {
                            "range" : {
                              "@timestamp" : {
                                "from" : "2021-07-05T09:40:43.237Z||-5m",
                                "to" : "2021-07-05T09:40:43.485Z"
                              }
                            }
                          }
                        ]
                      }
                    }
                  }
                }
              }
            },
            "condition" : {
              "type" : "always",
              "status" : "success",
              "met" : true
            },
            "actions" : [ ]
          },
          "messages" : [ ]
        }
      }
    ]
  }
}

action

在 watch history 中记录 watch 的记录固然不错,但是 Watcher 真正的强大在于能够在满足 watch 条件的时候做一些事情。 watch 的操作定义了当 watch 条件评估为真时要做什么。 你可以发送电子邮件、调用第三方 webhook、将文档写入 Elasticsearch 索引或将消息记录到标准 Elasticsearch 日志文件中。

例如,以下操作会在检测到错误时将消息写入 Elasticsearch 日志。

PUT _watcher/watch/log_error_watch
{
  "trigger": {
    "schedule": {
      "interval": "1m"
    }
  },
  "input": {
    "search": {
      "request": {
        "indices": [
          "logs"
        ],
        "body": {
          "query": {
            "bool": {
              "must": [
                {
                  "match": {
                    "message": "error"
                  }
                }
              ],
              "filter": [
                {
                  "range": {
                    "@timestamp": {
                      "from": "{{ctx.trigger.scheduled_time}}||-5m",
                      "to": "{{ctx.trigger.triggered_time}}"
                    }
                  }
                }
              ]
            }
          }
        }
      }
    }
  },
  "condition": {
    "compare": {
      "ctx.payload.hits.total": {
        "gt": 3
      }
    }
  },
  "actions": {
    "log_error": {
      "logging": {
        "text": "Found {{ctx.payload.hits.total}} errors in the logs"
      }
    }
  }
}

在上面定义的 action 中,它将在 Elasticsearch 的 Server 日志中打出上面 text 所输出的信息。

测试 Watcher

在上面,我们每隔 1 分钟检查一次,但是在实际的实践中,我们的这个 interval 可能会是1个小时,或者是1天的实践。我们不可能去等这么长的时间来测试这个 watcher。Elastic 为了方便我们的测试,提供了 _execute 接口:

PUT _watcher/watch/log_error_watch/_execute

通过上面的接口,我们不需要等在 schedule 中设置的那么久的时间才运行一下我们的 watcher。上面的命令可以直接帮我们运行 watcher。

我们连续运行如下的命令4次:

POST logs/event?pipeline=add-timestamp
{
  "request": "GET index.html",
  "status_code": 404,
  "message": "Error: File not found"
}

然后接着运行:

PUT _watcher/watch/log_error_watch/_execute

这样,我们就不用等每隔1分钟才可以看到输出的结果。我们立马在 Elasticsearch Server 的日志中看到输出:

至此,我们已经完整地创建并测试了我们的 Watcher。

在 Kibana 中进行查看

我们刚才通过命令创建的 watcher 也可以在 Kibana 的界面中进行查看。当然,我们甚至可以使用 Kibana 的界面来生成想要的 watcher:

在上面,我们可以看到完整的 watcher 定义。当然我们甚至可以在这个界面对它进行修改。

删除 Watcher

由于 log_error_watch 配置为每 1 分钟运行一次,因此请确保在完成实验后将其删除。 否则,此示例 Watcher 发出的噪音将使你很难看到您的观看历史记录和日志文件中发生了什么。

DELETE _watcher/watch/log_error_watch

当然,你也可以在 Kibana 的界面中对它进行删除。

Logo

华为开发者空间,是为全球开发者打造的专属开发空间,汇聚了华为优质开发资源及工具,致力于让每一位开发者拥有一台云主机,基于华为根生态开发、创新。

更多推荐