目录

 填坑过程

问题一:启动metrics server报证书错误:x509: cannot validate certificate for x.x.x.x because it doesn't contain any IP SANs" node="k8s-testing-02-191"

问题二:metrics server 一直未ready,查看日志报错:Failed to scrape node" err="Get \"https://x.x.x.x:10250/metrics/resource\": context deadline exceeded" 

问题三:metrics server启动成功,但是执行kubectl top node报错:Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)

metrics server启动参数

 附件:kube-metric-server.yaml启动文件


前面在使用kubeadm工具部署K8S时,做过Metrics的部署,过程很简单。后来在生产上使用二进制方式部署K8S后,创建Metrics插件却屡屡遇坑,此处记录一下填坑过程。部署步骤请参考《【K8S 三】部署 metrics-server 插件

为了更方便厘清问题,先上一张拓扑图(flanneld网络插件可以换成calico)

 填坑过程

问题一:启动metrics server报证书错误:x509: cannot validate certificate for x.x.x.x because it doesn't contain any IP SANs" node="k8s-testing-02-191"

 E0725 05:27:26.638019       1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.11.191:10250/metrics/resource\": x509: cannot validate certificate for 192.168.11.191 because it doesn't contain any IP SANs" node="k8s-testing-02-191"
 I0725 05:27:33.495998       1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"

解决:(一般没有这问题,metrics自建证书就OK的)

 添加参数
         - --kubelet-insecure-tls
 或者
         - --tls-cert-file=/etc/ssl/pki/ca.pem
         - --tls-private-key-file=/etc/ssl/pki/ca-key.pem

问题二:metrics server 一直未ready,查看日志报错:Failed to scrape node" err="Get \"https://x.x.x.x:10250/metrics/resource\": context deadline exceeded" 

 scraper.go:140] "Failed to scrape node" err="Get \"https://linshi-k8s-54:10250/metrics/resource\": context deadline exceeded" node="linshi-k8s-54"
 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"

解决:

保持--kubelet-preferred-address-types和apiserver一致
注释掉:        - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
因为kube-apiserver和metrics该配置项的默认值是一样的。

问题三:metrics server启动成功,但是执行kubectl top node报错:Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)

 kubectl top node
 Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)

问题定位:

#-- 查看metrics apiservice的event
Message:               failing or missing response from 
https://10.254.156.1:443/apis/metrics.k8s.io/v1beta1: Get 
"https://10.254.156.1:443/apis/metrics.k8s.io/v1beta1": dial tcp 10.254.156.1:443: i/o timeout
Reason:                FailedDiscoveryCheck
#-- 可以看到kubectl访问metrics的clusterIP超时了,配置apiserver配置--enable-aggregator-routing=true后,发现报错为
Message:               failing or missing response from 
https://172.254.247.87:4443/apis/metrics.k8s.io/v1beta1: Get 
"https://172.254.247.87:4443/apis/metrics.k8s.io/v1beta1": dial tcp 172.254.247.87:4443: i/o timeout
Reason:                FailedDiscoveryCheck
#-- kubectl直接访问endpoint也超时了
#-- 另:metrics service port只能监听在443上,手动配置成4443报错
Message:               service/metrics-server in "kube-system" is not listening on port 443
Reason:                ServicePortError
这是因为从该master到metrics server不通导致的;因为部署的master上没有kubelet和kube-proxy,如果apiserver上配置了--enable-aggregator-routing=true,则kubectl命令会直接访问metrics的endpoint,但是master无法访问node的pod网络(因为没有kubelet)。如果不配置--enable-aggregator-routing=true通过metrics service的clusterIP访问呢?因为没有kube-proxy代理导致对clusterIP也是不通(可以参看前面的拓扑图)。

解决:

# 修改metrics server启动YAML文件:
 deployment.spec.template.spec.hostNetwork: true
# 或者
# 固定metrics service的地址,然后手动添加路由策略。

问题四:指定了外部的--tls-cert-file和--tls-private-key-file,启动metrics报错

通过kubectl logs命令查看pod的日志信息:
panic: open /etc/ssl/pki/ca-key.pem: permission denied

解决:
去掉--tls-cert-file和--tls-private-key-file,使用metrics自己生成的证书。这样看来,metrics使用的证书并没必要和kubelet使用的一样。

metrics server启动参数

#--- metricsTLS  server的启动参数可以通过下面命令自查询
docker run --rm 192.168.11.101/library/metrics-server:v0.6.1 --help

--cert-dir=/tmp
#-- TLS证书存放目录,如果--tls-cert-file and --tls-private-key-file配置了,那么该参数被忽略
--secure-port=4443
#-- 提供带有身份验证和授权的HTTPS服务的端口。如果为0,则不提供HTTPS服务。443(默认)
--kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
#-- 用于kubelet连接的首选NodeAddressTypes的列表.这里要和kube-apiserver配置保持一致  (default [Hostname,InternalDNS,InternalIP,ExternalDNS,ExternalIP])
--kubelet-use-node-status-port
#-- 使用node状态中的port,优先级高于--kubelet-port
--metric-resolution=30s
#-- metrics-server到kubelet的采集周期,必须设置值至少10s。(默认1m0s)
--kubelet-insecure-tls
#-- 不要验证由Kubelets提供的CA或服务证书。仅供测试之用。如果不用该参数则需要将--tls-cert-file和--tls-private-key-file传入metrics server
--tls-cert-file
#-- 包含用于HTTPS的默认x509证书的文件。如果启用HTTPS服务,且不提供--tls-cert-file和--tls-private-key-file,则生成一个针对公共地址的自签名证书和密钥,并保存到--cert-dir指定的目录中。
--tls-private-key-file
#-- 包含默认的x509私钥匹配的文件--tls-cert-file。
--kubelet-port
#-- The port to use to connect to Kubelets. (default 10250)

 附件:kube-metric-server.yaml启动文件

apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    k8s-app: metrics-server
    rbac.authorization.k8s.io/aggregate-to-admin: "true"
    rbac.authorization.k8s.io/aggregate-to-edit: "true"
    rbac.authorization.k8s.io/aggregate-to-view: "true"
  name: system:aggregated-metrics-reader
rules:
- apiGroups:
  - metrics.k8s.io
  resources:
  - pods
  - nodes
  verbs:
  - get
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    k8s-app: metrics-server
  name: system:metrics-server
rules:
- apiGroups:
  - ""
  resources:
  - nodes/metrics
  verbs:
  - get
- apiGroups:
  - ""
  resources:
  - pods
  - nodes
  verbs:
  - get
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server-auth-reader
  namespace: kube-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server:system:auth-delegator
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:auth-delegator
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: system:metrics-server
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:metrics-server
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: kube-system
---
apiVersion: v1
kind: Service
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
spec:
  ports:
  - name: https
    port: 443
    protocol: TCP
    targetPort: https
  selector:
    k8s-app: metrics-server
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
spec:
  selector:
    matchLabels:
      k8s-app: metrics-server
  strategy:
    rollingUpdate:
      maxUnavailable: 0
  template:
    metadata:
      labels:
        k8s-app: metrics-server
    spec:
      containers:
      - args:
        - --cert-dir=/tmp
        - --secure-port=4443
        - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
        - --kubelet-use-node-status-port
        - --metric-resolution=30s
        - --kubelet-insecure-tls
#        - --tls-cert-file=/etc/ssl/pki/ca.pem
#        - --tls-private-key-file=/etc/ssl/pki/ca-key.pem
        image: HARBOR_HOST_NAME/library/metrics-server:v0.6.1
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /livez
            port: https
            scheme: HTTPS
          periodSeconds: 10
        name: metrics-server
        ports:
        - containerPort: 4443
          name: https
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /readyz
            port: https
            scheme: HTTPS
          initialDelaySeconds: 20
          periodSeconds: 10
        resources:
          requests:
            cpu: 100m
            memory: 200Mi
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          runAsUser: 1000
        volumeMounts:
        - mountPath: /tmp
          name: tmp-dir
#        - mountPath: /etc/ssl/pki
#          name: cert-dir
      nodeSelector:
        kubernetes.io/os: linux
      priorityClassName: system-cluster-critical
      serviceAccountName: metrics-server
      hostNetwork: true
      volumes:
      - emptyDir: {}
        name: tmp-dir
#      - name: cert-dir
#        hostPath:
#          path: /etc/ssl/certs/ca-certs/
---
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  labels:
    k8s-app: metrics-server
  name: v1beta1.metrics.k8s.io
spec:
  group: metrics.k8s.io
  groupPriorityMinimum: 100
  insecureSkipTLSVerify: true
  service:
    name: metrics-server
    namespace: kube-system
  version: v1beta1
  versionPriority: 100

Logo

为开发者提供学习成长、分享交流、生态实践、资源工具等服务,帮助开发者快速成长。

更多推荐