Prometheus监控Linux虚拟机

一、概述1.1 简介官网地址:https://prometheus.io/Prometheus是最初在SoundCloud上构建的开源系统监视和警报工具包。自2012年成立以来，许多公司和组织都采用了Prometheus，该项目拥有非常活跃的开发人员和用户社区。现在，它是一个独立的开源项目，并且独立于任何公司进行维护。为了强调这一点并阐明项目的治理结构，Prometheus于2016年加入了Clo

Jeremy_Lee123

3400人浏览 · 2021-02-05 09:39:05

Jeremy_Lee123 · 2021-02-05 09:39:05 发布

一、概述

官网地址: https://prometheus.io/

Prometheus是最初在SoundCloud上构建的开源系统监视和警报工具包。自2012年成立以来，许多公司和组织都采用了Prometheus，该项目拥有非常活跃的开发人员和用户社区。现在，它是一个独立的开源项目，并且独立于任何公司进行维护。为了强调这一点并阐明项目的治理结构，Prometheus 于2016年加入了 Cloud Native Computing Foundation，这是继Kubernetes之后的第二个托管项目。

1.1 Prometheus 的优点

非常少的外部依赖，安装使用超简单
已经有非常多的系统集成例如：docker HAProxy Nginx JMX等等
服务自动化发现
直接集成到代码
设计思想是按照分布式、微服务架构来实现的

1.2 Prometheus 的特性

一个多维数据模型，其中包含通过度量标准名称和键/值对标识的时间序列数据
PromQL，一种灵活的查询语言，可利用此维度
不依赖分布式存储；单服务器节点是自治的
时间序列收集通过HTTP上的拉模型进行
通过中间网关支持推送时间序列
通过服务发现或静态配置发现目标
多种图形和仪表板支持模式

1.3 Prometheus 原理架构

下图说明了Prometheus的体系结构及其某些生态系统组件：

Prometheus生态系统包含多个组件，其中许多是可选的：

Prometheus server ，它会抓取并存储时间序列数据
client libraries，用于检测应用程序代码
push gateway，一个支持短期工作的推送网关
诸如HAProxy，StatsD，Graphite等服务的专用输出端
一个alertmanager处理警报
各种支持工具

Prometheus直接或通过中介推送网关从已检测作业中删除指标，以处理短暂的作业。它在本地存储所有报废的样本，并对这些数据运行规则，以汇总和记录现有数据中的新时间序列，或生成警报。Grafana或其他API使用者可以用来可视化收集的数据。

1.4、Prometheus的数据模型

Prometheus从根本上所有的存储都是按时间序列去实现的，相同的metrics(指标名称) 和label(一个或多个标签) 组成一条时间序列，不同的label表示不同的时间序列。为了支持一些查询，有时还会临时产生一些时间序列存储。

每条时间序列是由唯一的”指标名称”和一组”标签（key=value）”的形式组成。

指标名称：一般是给监测对像起一名字，例如http_requests_total这样，它有一些命名规则，可以包字母数字_之类的的。通常是以应用名称开头_监测对像_数值类型_单位这样。例如：push_total、userlogin_mysql_duration_seconds、app_memory_usage_bytes。

标签：就是对一条时间序列不同维度的识别了，例如一个http请求用的是POST还是GET，它的endpoint是什么，这时候就要用标签去标记了。最终形成的标识便是这样了：http_requests_total{method=”POST”,endpoint=”/api/tracks”}。

记住，针对http_requests_total这个metrics name无论是增加标签还是删除标签都会形成一条新的时间序列。查询语句就可以跟据上面标签的组合来查询聚合结果了。如果以传统数据库的理解来看这条语句，则可以考虑http_requests_total是表名，标签是字段，而timestamp是主键，还有一个float64字段是值了。（Prometheus里面所有值都是按float64存储）。

参考官网：Data model | Prometheus

1.5、Prometheus四种数据类型

Counter

Counter用于累计值，例如记录请求次数、任务完成数、错误发生次数。一直增加，不会减少。重启进程后，会被重置。

例如：http_response_total{method=”GET”,endpoint=”/api/tracks”} 100，10秒后抓取http_response_total{method=”GET”,endpoint=”/api/tracks”} 100。

Gauge

Gauge常规数值，例如温度变化、内存使用变化。可变大，可变小。重启进程后，会被重置。

例如： memory_usage_bytes{host=”master-01″} 100 < 抓取值、memory_usage_bytes{host=”master-01″} 30、memory_usage_bytes{host=”master-01″} 50、memory_usage_bytes{host=”master-01″} 80 < 抓取值。

Histogram

Histogram（直方图）可以理解为柱状图的意思，常用于跟踪事件发生的规模，例如：请求耗时、响应大小。它特别之处是可以对记录的内容进行分组，提供count和sum全部值的功能。

例如：{小于10=5次，小于20=1次，小于30=2次}，count=7次，sum=7次的求和值。

Summary

Summary和Histogram十分相似，常用于跟踪事件发生的规模，例如：请求耗时、响应大小。同样提供 count 和 sum 全部值的功能。

例如：count=7次，sum=7次的值求值。

它提供一个quantiles的功能，可以按%比划分跟踪的结果。例如：quantile取值0.95，表示取采样值里面的95%数据。

参考官网：Metric types | Prometheus

二、安装Prometheus及其组件

2.1 安装前准备

监控服务器需要安装4个服务：

Prometheus Server(普罗米修斯监控主服务器 ) 10.21.70.101
Node Exporter (收集Host硬件和操作系统信息) 10.1.205.154
cAdvisor (负责收集Host上运行的容器信息) 10.21.70.101
Grafana (展示普罗米修斯监控界面） 10.21.70.101

被监控的只有安装2个：

Node Exporter (收集Host硬件和操作系统信息) 10.21.70.101
cAdvisor (负责收集Host上运行的容器信息) 10.21.70.101

Liunx安装：

时间同步
关闭防火墙和selinux
若采用容器安装需要安装Docker，也可以通过安装包解压安装（两种都会演示）

2.2 安装Node Exporter组件（监控VM）

所有被监控节点运行以下命令安装Node Exporter 容器

安装命令：docker pull prom/node-exporter:latest

制作启动脚本: vi node-export-start.sh

docker run -d -p 9100:9100 \
-v "/proc:/host/proc" \
-v "/sys:/host/sys" \
-v "/:/rootfs" \
-v "/etc/localtime:/etc/localtime" \
prom/node-exporter \
--path.procfs /host/proc \
--path.sysfs /host/sys \
--collector.filesystem.ignored-mount-points "^/(sys|proc|dev|host|etc)($|/)"

启动Node Exporter组件：./node-export-start.sh

验证是否成功：访问网址 http://10.1.205.154:9100/metrics ，虚拟机数据上报成功！

提供单个指标的查询接口：

2.3 安装Prometheus主服务（监控汇总）

主服务节点运行以下命令安装Prometheus容器

安装命令：docker pull prom/prometheus:latest

制作启动脚本: vi prometheus-start.sh

docker run -d -p 9090:9090 \
-v /home/docker/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml \
-v "/etc/localtime:/etc/localtime" \
--name prometheus \
prom/prometheus

配置数据文件：vi prometheus.yml

[root@slave1 prometheus]# pwd
/home/docker/prometheus

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ['localhost:9090','10.1.205.154:9100']

启动Prometheus组件：./prometheus-start.sh

验证是否成功：访问网址 http://10.21.70.84:9090/targets ，监控数据汇总成功！

FAQ:

[root@slave1 prometheus]# ./prometheus-start.sh
ee9f8bc4d82f3dc01f74300a0844fbf00409c9d939504b4e0fec7aa513d348f3
docker: Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused " process_linux.go:430: container init caused \"rootfs_linux.go:58: mounting \\\"/root/prometheus.yml\\\" to rootfs \\\"/var/lib/docker/overlay2/5ed253f82ba590d6834f5a17fb1d9c71e54d3966bfacd3b9519a6074a4350fcb/merged\\\" at \\\"/var/lib/docker/overlay2/5ed253f82ba590d6834f5a17fb1d9c71e54d3966bfacd3b9519a6074a4350fcb/merged/etc/prometheus/prometheus.yml\\\" caused \\\"not a directory\\\"\"": unknown: Are you trying to mount a directory onto a file (or vice-versa)? Check if the specified host path exists and is the expected type.

ANS: prometheus.yml 路径放置错误！

三、安装Grafana组件（图形化展示）

~~所有被监控节点运行以下命令安装Grafana容器~~

安装命令：docker pull grafana/grafana:latest

制作启动脚本: vi grafana-start.sh

docker run -d -i -p 3000:3000 \
-v "/etc/localtime:/etc/localtime" \
-e "GF_SERVER_ROOT_URL=http://grafana.server.name" \
-e "GF_SECURITY_ADMIN_PASSWORD=admin123" \
grafana/grafana

启动Grafana组件：./grafana-start.sh

验证是否成功：访问网址 http://10.21.70.84 :3000/metrics 或者登录界面 http://10.21.70.84 :3000，用户名/密码：admin/admin123,登录成功！