Kubernetes 集群中 ServiceMonitor 无效问题的深度剖析

云计算 云原生
线上环境新上了几个服务,需要监控它相应的指标,这边使用 Prometheus-Operator 的 ServiceMonitor 实现。

引言

线上环境新上了几个服务,需要监控它相应的指标,这边使用 Prometheus-Operator 的 ServiceMonitor 实现。

马上开动。

开始

直接上它的 YAML 文件:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: lobby-bank-consumer
  namespace: lobby
  labels:
    app.kubernetes.io/name: lobby-bank-consumer
    app.kubernetes.io/part-of: lobby
spec:
  selector:
    matchLabels:
      app: lobby-bank-consumer
  namespaceSelector:
    matchNames:
      - lobby
  endpoints:
    - port: tcp-63200
      path: /metrics
      interval: 30s
      scrapeTimeout: 10s
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: lobby-bank-producer
  namespace: lobby
  labels:
    app.kubernetes.io/name: lobby-bank-producer
    app.kubernetes.io/part-of: lobby
spec:
  selector:
    matchLabels:
      app: lobby-bank-producer
  namespaceSelector:
    matchNames:
      - lobby
  endpoints:
    - port: tcp-63100
      path: /metrics
      interval: 30s
      scrapeTimeout: 10s
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: lobby-bank-server
  namespace: lobby
  labels:
    app.kubernetes.io/name: lobby-bank-server
    app.kubernetes.io/part-of: lobby
spec:
  selector:
    matchLabels:
      app: lobby-bank-server
  namespaceSelector:
    matchNames:
      - lobby
  endpoints:
    - port: tcp-63001
      path: /metrics
      interval: 30s
      scrapeTimeout: 10s

部署:

$ kubectl apply -f lobby-bank-sm.yaml

部署完成后,这边没有数据:

图片图片

开始排查。

排查

详细检查了我的 ServiceMonitor YAML 文件是否有问题,发现没有问题,奇怪了,

想了半天,我想不应该是 RBAC 之类的,但是没办法了,只能去看看 Prometheus 的 Logs 了。

没想到问题真出在这里:

图片图片

这里有添加了相应资源和 Verb:

- apiGroups: 
  - "monitoring.coreos.com"
  resources:
  - servicemonitors
  - podmonitors
  verbs:
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - nodes
  - nodes/metrics
  - services
  - endpoints
  - pods
  verbs:
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - configmaps

以下是完整的 YAML 文件:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    app.kubernetes.io/component: prometheus
    app.kubernetes.io/instance: k8s
    app.kubernetes.io/name: prometheus
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 3.0.1
  name: prometheus-k8s
rules:
- apiGroups: 
  - "monitoring.coreos.com"
  resources:
  - servicemonitors
  - podmonitors
  verbs:
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - nodes
  - nodes/metrics
  - services
  - endpoints
  - pods
  verbs:
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - configmaps
  verbs:
  - get
  - list
  - watch
- nonResourceURLs:
  - /metrics
  - /metrics/slis
  verbs:
  - get

重新部署下 Prometheus-Operator:

$ kubectl delete -f .
$ kubectl create -f .

依次等待全部启动完成。

再次查看:

图片图片

最好再用 PromQL 查看下:

图片图片

责任编辑:武晓燕 来源: Eternal Heights
相关推荐

2010-02-01 13:34:59

Python 脚本

2022-03-07 10:41:09

云计算容器Kubernetes

2010-01-08 14:06:49

JSON 形式

2010-03-01 16:48:02

Python模块

2010-02-06 15:32:30

Android架构

2010-02-05 18:00:18

Android源代码

2010-02-23 10:05:52

Python历史

2010-05-24 16:58:44

SVN安装

2010-01-27 15:50:23

C++复杂性

2010-02-07 15:42:46

Android单元测试

2010-01-27 09:31:39

C++Test测试

2009-12-23 16:14:23

2010-08-26 15:44:20

CSSexpression

2010-01-28 16:31:54

C++类型

2024-03-15 10:05:13

Kubernetes容器云原生

2010-01-06 10:42:18

JSON协议

2010-01-08 15:06:35

JSON功能

2010-01-11 10:19:57

C++开发工具

2010-09-02 15:54:54

CSS边界叠加

2022-09-27 18:56:28

ArrayList数组源代码
点赞
收藏

51CTO技术栈公众号