本文转载自微信公众号「运维开发故事」,作者没有文案的夏老师。转载本文请联系运维开发故事公众号。
离线事件告警
kube-eventer是由阿里开源的k8s离线事件收集器,开源地址
https://github.com/AliyunContainerService/kube-eventer/blob/master/docs/en/webhook-sink.md
在Kubernetes中,事件分为两种,一种是Warning事件,表示产生这个事件的状态转换是在非预期的状态之间产生的;另外一种是Normal事件,表示期望到达的状态,和目前达到的状态是一致的。
我们以NPD的event来讲解。事件影响节点的临时性问题,但是它是对于系统诊断是有意义的。NPD就是利用kubernetes的上报机制,通过检测系统的日志(例如centos中journal),把错误的信息上报到kuberntes的node上。这些日志(例如内核日志)中噪音信息太多,NPD会提取其中有价值的信息,可以将这些信息生成离线事件。这样我就可以得到node上的时间,及时进行处理。
一个标准的Kubernetes事件有如下几个重要的属性,通过这些属性可以更好地诊断和告警问题。Namespace:产生事件的对象所在的命名空间。
Kind:绑定事件的对象的类型,例如:Node、Pod、Namespace、Componenet等等。
Timestamp:事件产生的时间等等。
Reason:产生这个事件的原因。Message: 事件的具体描述。
目前的sinks支持大致如下:
Sink Name | Description |
---|---|
dingtalk | sink to dingtalk bot |
sls | sink to alibaba cloud sls service |
elasticsearch | sink to elasticsearch |
honeycomb | sink to honeycomb |
influxdb | sink to influxdb |
kafka | sink to kafka |
mysql | sink to mysql database |
sink to wechat |
今天主要带来webhook的开挂技巧。首先看支持的参数:
- level - Level of event (optional. default: Warning. Options: Warning and Normal)
- namespaces - Namespaces to filter (optional. default: all namespaces,use commas to separate multi namespaces, namespace filter doesn't support regexp)
- kinds - Kinds to filter (optional. default: all kinds,use commas to separate multi kinds. Options: Node,Pod and so on.)
- reason - Reason to filter (optional. default: empty, Regexp pattern support). You can use multi reason fields in query.
- method - Method to send request (optional. default: GET)
- header - Header in request (optional. default: empty). You can use multi header field in query.
- custom_body_configmap - The configmap name of request body template. You can use Template to customize request body. (optional.)
- custom_body_configmap_namespace - The configmap namespace of request body template.
如果每个项目namespace与负责人是一一对应的,就可以根据configmap与sink关联起来。变更上线部署是最容易出现事件的时候,通过事件是可以快速的发现上线的镜像tag错误,镜像配置错误等问题。
首先configmap,通过custom_body_configmap的值来选择不同的配置文件。可以简单修饰一下,使其变得更加清晰。
添加加Cluster:name可以知道是哪个集群的event。
添加加"mentioned_list":["wangqin","@all"]可以@对应的负责人。
- ---
- apiVersion: v1
- data:
- content: >-
- {"msgtype": "text","text": {"content": "Cluster:name\nEventType:{{ .Type }}\nEventNamespace:{{ .InvolvedObject.Namespace }}\nEventKind:{{ .InvolvedObject.Kind }}\nEventObject:{{ .InvolvedObject.Name }}\nEventReason:{{ .Reason }}\nEventTime:{{ .LastTimestamp }}\nEventMessage:{{ .Message }}","mentioned_list":["wangqing","@all"]}}
- kind: ConfigMap
- metadata:
- name: custom-webhook-body
- namespace: nameapce
命令部分的技巧
sink是一个数组,可以加很多条。
主要说明用webhook向企业微信的的通知。注意reason是可以支持正则表达式的。通过configmap就一起完成了k8s机器的事件告警。
- command:
- - "/kube-eventer"
- - "--source=kubernetes:https://kubernetes.default"
- ## .e.g,dingtalk sink demo
- - --sink=webhook:https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxxxx&level=Warning&reason=[^Unhealthy]&namespaces=xxxx&header=Content-Type=application/json&custom_body_configmap=custom-webhook-body0&custom_body_configmap_namespace=xxxx&method=POST
- - --sink=webhook:https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxxxx&level=Warning&reason=BackOff&namespaces=xxxx&header=Content-Type=application/json&custom_body_configmap=custom-webhook-body1&custom_body_configmap_namespace=xxxx&method=POST
- - --sink=webhook:https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxxxx&level=Warning&reason=Failed&namespaces=xxxx&header=Content-Type=application/json&custom_body_configmap=custom-webhook-body2&custom_body_configmap_namespace=xxxxx&method=POST
案列:
创建一个企业微信群的机器人。比如:https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxxxx。
- apiVersion: apps/v1
- kind: Deployment
- metadata:
- labels:
- name: kube-eventer
- name: kube-eventer
- namespace: namespace
- spec:
- replicas: 1
- selector:
- matchLabels:
- app: kube-eventer
- template:
- metadata:
- labels:
- app: kube-eventer
- annotations:
- scheduler.alpha.kubernetes.io/critical-pod: ''
- spec:
- dnsPolicy: ClusterFirstWithHostNet
- serviceAccount: kube-eventer
- containers:
- - image: registry.aliyuncs.com/acs/kube-eventer-amd64:v1.2.0-484d9cd-aliyun
- name: kube-eventer
- command:
- - "/kube-eventer"
- - "--source=kubernetes:https://kubernetes.default"
- ## .e.g,dingtalk sink demo
- - --sink=webhook:https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxxxx&level=Warning&reason=[^Unhealthy]&namespaces=xxxx&header=Content-Type=application/json&custom_body_configmap=custom-webhook-body0&custom_body_configmap_namespace=xxxx&method=POST
- #- --sink=webhook:https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxxxx&level=Warning&reason=BackOff&namespaces=xxxx&header=Content-Type=application/json&custom_body_configmap=custom-webhook-body1&custom_body_configmap_namespace=xxxx&method=POST
- #- --sink=webhook:https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxxxx&level=Warning&reason=Failed&namespaces=xxxx&header=Content-Type=application/json&custom_body_configmap=custom-webhook-body2&custom_body_configmap_namespace=xxxxx&method=POST
- env:
- # If TZ is assigned, set the TZ value as the time zone
- - name: TZ
- value: "Asia/Shanghai"
- volumeMounts:
- - name: localtime
- mountPath: /etc/localtime
- readOnly: true
- - name: zoneinfo
- mountPath: /usr/share/zoneinfo
- readOnly: true
- resources:
- requests:
- cpu: 200m
- memory: 100Mi
- limits:
- cpu: 500m
- memory: 250Mi
- volumes:
- - name: localtime
- hostPath:
- path: /etc/localtime
- - name: zoneinfo
- hostPath:
- path: /usr/share/zoneinfo
- ---
- apiVersion: rbac.authorization.k8s.io/v1
- kind: ClusterRole
- metadata:
- name: kube-eventer
- rules:
- - apiGroups:
- - ""
- resources:
- - events
- - configmaps
- verbs:
- - get
- - list
- - watch
- ---
- apiVersion: rbac.authorization.k8s.io/v1
- kind: ClusterRoleBinding
- metadata:
- name: kube-eventer
- roleRef:
- apiGroup: rbac.authorization.k8s.io
- kind: ClusterRole
- name: kube-eventer
- subjects:
- - kind: ServiceAccount
- name: kube-eventer
- namespace: namespace
- ---
- apiVersion: v1
- kind: ServiceAccount
- metadata:
- name: kube-eventer
- namespace: namespace
- ---
- apiVersion: v1
- data:
- content: >-
- {"msgtype": "text","text": {"content": "Cluster:name\nEventType:{{ .Type }}\nEventNamespace:{{ .InvolvedObject.Namespace }}\nEventKind:{{ .InvolvedObject.Kind }}\nEventObject:{{ .InvolvedObject.Name }}\nEventReason:{{ .Reason }}\nEventTime:{{ .LastTimestamp }}\nEventMessage:{{ .Message }}","mentioned_list":["wangqing","@all"]}}
- kind: ConfigMap
- metadata:
- name: custom-webhook-body
- namespace: nameapce
这样就可以完成向谁告警,谁进行处理的简单分配。有了事件告警,可以及时发现服务问题与集群问题并进行修复。