kube-promethues配置钉钉告警

kube-promethues配置钉钉告警

前置:k8s部署kube-promethues

一.配置钉钉机器人

  • 打开钉钉的智能群助手,点击添加机器人

  • 选择自定义机器人

  • 勾选加签,复制后保存

  • 复制webhook地址后点击保存

二.编写dingtalk的yaml部署文件

vi dingtalk.yaml
apiVersion: v1
kind: Service
metadata:
 name: dingtalk
 namespace: monitoring
spec:
 selector:
 app: dingtalk
 ports:
 - name: http
 protocol: TCP
 port: 8060
 targetPort: 8060
---
apiVersion: apps/v1
kind: Deployment
metadata:
 name: dingtalk
 namespace: monitoring
 labels:
 app: dingtalk
spec:
 replicas: 1
 strategy:
 rollingUpdate:
 maxSurge: 25%
 maxUnavailable: 25%
 type: RollingUpdate
 selector:
 matchLabels:
 app: dingtalk
 template:
 metadata:
 labels:
 app: dingtalk
 spec:
 restartPolicy: "Always"
 containers:
 - name: dingtalk
 image: timonwong/prometheus-webhook-dingtalk:v2.1.0
 imagePullPolicy: "IfNotPresent"
 volumeMounts:
 - name: dingtalk-conf
 mountPath: /etc/prometheus-webhook-dingtalk/
 resources:
 limits:
 cpu: "400m"
 memory: "500Mi"
 requests:
 cpu: "100m"
 memory: "100Mi"
 ports:
 - containerPort: 8060
 name: http
 protocol: TCP
 readinessProbe:
 failureThreshold: 3
 periodSeconds: 5
 initialDelaySeconds: 30
 successThreshold: 1
 tcpSocket:
 port: 8060
 livenessProbe:
 tcpSocket:
 port: 8060
 initialDelaySeconds: 30
 periodSeconds: 10
 volumes:
 - name: dingtalk-conf
 configMap:
 name: dingtalk-cm

prometheus-webhook-dingtalk是一个开源的钉钉告警的插件,目前最新版停留于v2.1.0

三.编写钉钉告警模板dingtalk-configmap.yaml

vi dingtalk-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
 name: dingtalk-cm
 namespace: monitoring
data:
 config.yml: |-
 templates:
 - /etc/prometheus-webhook-dingtalk/dingding.tmpl
 targets:
 webhook:
 url: https://oapi.dingtalk.com/robot/send?access_token=<复制的webhook地址>
 secret: "<加签的时候复制的secret>"
 message:
 text: '{{ template "dingtalk.to.message" . }}'
 dingding.tmpl: |-
 {{ define "dingtalk.to.message" }}
 {{- if gt (len .Alerts.Firing) 0 -}}
 {{- range $index, $alert := .Alerts -}}
 ========= **监控告警** =========
 **告警集群:** k8s
 **告警类型:** {{ $alert.Labels.alertname }}
 **告警级别:** {{ $alert.Labels.severity }}
 **告警状态:** {{ .Status }}
 **故障主机:** {{ $alert.Labels.instance }} {{ $alert.Labels.device }}
 **告警主题:** {{ .Annotations.summary }}
 **告警详情:** {{ $alert.Annotations.message }}{{ $alert.Annotations.description}}
 **主机标签:** {{ range .Labels.SortedPairs }} </br> [{{ .Name }}: {{ .Value | markdown | html }} ]
 {{- end }} </br>
 **故障时间:** {{ ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}
 ========= = **end** = =========
 {{- end }}
 {{- end }}
 {{- if gt (len .Alerts.Resolved) 0 -}}
 {{- range $index, $alert := .Alerts -}}
 ========= **故障恢复** =========
 **告警集群:** k8s
 **告警主题:** {{ $alert.Annotations.summary }}
 **告警主机:** {{ .Labels.instance }}
 **告警类型:** {{ .Labels.alertname }}
 **告警级别:** {{ $alert.Labels.severity }}
 **告警状态:** {{ .Status }}
 **告警详情:** {{ $alert.Annotations.message }}{{ $alert.Annotations.description}}
 **故障时间:** {{ ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}
 **恢复时间:** {{ ($alert.EndsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}
 ========= = **end** = =========
 {{- end }}
 {{- end }}
 {{- end }}

四.编写文件alertmanager-secret.yaml

该文件是 用来顶替原本kube-promethues部署时的,alertmanager的配置文件

vi alertmanager-secret.yaml
apiVersion: v1
data: { }
kind: Secret
metadata:
 name: alertmanager-main
 namespace: monitoring
stringData:
 alertmanager.yaml: |-
 global:
 resolve_timeout: 5m
 route:
 group_by: ['alertname']
 group_wait: 30s
 group_interval: 5m
 repeat_interval: 30m
 receiver: 'webhook'
 routes:
 - match:
 severity: 'info'
 continue: true
 receiver: 'null'
 - match:
 severity: 'none'
 continue: true
 receiver: 'null'
 receivers:
 - name: 'null'
 - name: 'webhook'
 webhook_configs:
 - send_resolved: true
 url: 'http://dingtalk:8060/dingtalk/webhook/send'

五.添加Prometheus告警规则

因为kube-promethues默认的告警规则大部分都和K8s的pod相关,所以需要新增一些关于node节点的告警规则

 - name: 主机状态-监控告警
 rules:
 - alert: 节点内存
 expr: (1 - (node_memory_MemAvailable_bytes / (node_memory_MemTotal_bytes)))* 100 > 85
 for: 1m
 labels:
 severity: warning
 annotations:
 summary: "内存使用率过高!"
 description: "节点{{$labels.instance}} 内存使用大于85%(目前使用:{{$value}}%)"
 - alert: 节点TCP会话
 expr: node_netstat_Tcp_CurrEstab > 1000
 for: 1m
 labels:
 severity: warning
 annotations:
 summary: "TCP_ESTABLISHED过高!"
 description: "{{$labels.instance }} TCP_ESTABLISHED大于1000%(目前使用:{{$value}}%)"
 - alert: 节点磁盘容量
 expr: max((node_filesystem_size_bytes{fstype=~"ext.?|xfs"}-node_filesystem_free_bytes{fstype=~"ext.?|xfs"}) *100/(node_filesystem_avail_bytes {fstype=~"ext.?|xfs"}+(node_filesystem_size_bytes{fstype=~"ext.?|xfs"}-node_filesystem_free_bytes{fstype=~"ext.?|xfs"})))by(instance) > 80
 for: 1m
 labels:
 severity: warning
 annotations:
 summary: "节点磁盘分区使用率过高!"
 description: "{{$labels.instance }} 磁盘分区使用大于80%(目前使用:{{$value}}%)"
 - alert: 节点CPU
 expr: (100 - (avg by (instance) (irate(node_cpu_seconds_total{job=~".*",mode="idle"}[5m])) * 100)) > 85
 for: 1m
 labels:
 severity: warning
 annotations:
 summary: "节点CPU使用率过高!"
 description: "{{$labels.instance }} CPU使用率大于80%(目前使用:{{$value}}%)"

将此段配置添加到kube-promethues解压目录manifests/prometheus中的prometheus-rules.yaml底部即可。

#然后执行更新prometheus-rules.yaml
kubectl apply -f prometheus-rules.yaml
#重启Promethus看见下图有新增的告警规则即配置成功

六.部署其他并检查是否运行成功

kubectl apply -f alertmanager-secret.yaml
kubectl apply -f dingtalk-configmap.yaml
kubectl apply -f dingtalk.yaml
#查看是否部署成功
kubectl get pods -n monitoring | grep dingtalk

dingtalk部署成功后,重新部署alertmanager就行了。

作者:山有扶苏QWQ原文地址:https://www.cnblogs.com/blogof-fusu/p/17799518.html

%s 个评论

要回复文章请先登录注册