前二篇文章讲了Prometheus在K8S群集中的部署和监控配置;接下来就是需要创建Prometheus的告警配置,并且在Prometheus配置文件中声明式的加载。
记录规则Record Rules
Prometheus 支持两种类型的 Rules ;可以对其进行配置,然后定期进行运算:recording rules 记录规则 与 alerting rules 警报规则,规则文件的计算频率与告警规则计算频率一致,都是通过全局配置中的 evaluation_interval 定义。
recording rules 是提前设置好一个比较花费大量时间运算或经常运算的表达式,其结果保存成一组新的时间序列数据。当需要查询的时候直接会返回已经计算好的结果,这样会比直接查询快,同时也减轻了PromQl的计算压力,同时对可视化查询的时候也很有用,可视化展示每次只需要刷新重复查询相同的表达式即可。
在配置的时候,除却 record: <string> 需要注意,其他的基本上是一样的,一个 groups 下可以包含多条规则 rules ,Recording 和 Rules 保存在 group 内,Group 中的规则以规则的配置时间间隔顺序运算,也就是全局中的 evaluation_interval 设置。
同样,可以将规则配置存放于configmap中:
vim prometheus-server-rules.yml
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-server-rules
namespace: monitoring
labels:
app: prometheus
data:
record-rule-pod-count.yaml: |
groups:
- name: pod-count-normal
rules:
- record: pod_count_normal_export
expr: sum(kube_pod_status_phase{phase=~"Running|Succeeded"})by(namespace,cluster)
- name: pod-count-abnormal
rules:
- record: pod_count_abnormal_export
expr: sum(kube_pod_status_phase{phase!~"Running|Succeeded"})by(namespace,cluster)
kubectl create -f prometheus-server-rules.yml kubectl -n monitoring get configmaps
此规则是判断pod的运行状态并计算相关数量存入新建的记录值(record)中。
修改部署Prometheus的YAML文件并应用,使其能够加载到Rules的配置文件:
vim prometheus.yml
kind: StatefulSet
apiVersion: apps/v1
metadata:
name: prometheus
namespace: monitoring
labels:
app: prometheus
component: server
release: v2.26.0
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
component: server
template:
metadata:
labels:
app: prometheus
component: server
release: v2.26.0
spec:
volumes:
- name: localtime
hostPath:
path: /etc/localtime
type: ''
- name: config-volume
configMap:
name: prometheus
items:
- key: prometheus.yml
path: prometheus.yml
defaultMode: 420
- name: prometheus-rules
configMap:
name: prometheus-server-rules
defaultMode: 420
containers:
- name: prometheus-server
image: 'prom/prometheus:v2.26.0'
command:
- /bin/sh
- '-c'
- 'prometheus --storage.tsdb.retention=30d --config.file=/etc/config/prometheus.yml --storage.tsdb.path=/data/${HOSTNAME} --web.enable-lifecycle'
ports:
- containerPort: 9090
protocol: TCP
resources:
limits:
cpu: '2'
memory: 8Gi
requests:
cpu: 500m
memory: 2Gi
volumeMounts:
- name: localtime
readOnly: true
mountPath: /etc/localtime
- name: config-volume
mountPath: /etc/config
- name: pvc
mountPath: /data
- name: prometheus-rules
mountPath: /etc/prometheus/rules
livenessProbe:
httpGet:
path: /-/healthy
port: 9090
scheme: HTTP
initialDelaySeconds: 30
timeoutSeconds: 30
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
readinessProbe:
httpGet:
path: /-/ready
port: 9090
scheme: HTTP
initialDelaySeconds: 30
timeoutSeconds: 30
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
imagePullPolicy: IfNotPresent
securityContext:
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
restartPolicy: Always
serviceAccountName: prometheus
serviceAccount: prometheus
volumeClaimTemplates:
- kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: managed-nfs-storage
serviceName: prometheus-headless
kubectl replace -f prometheus.yml
通过下面命令即可看到Pod已将configmap挂载。
kubectl -n monitoring exec -it prometheus-0 -- ls /etc/prometheus/rules/
同时还需要修改Prometheus配置文件,增加读取规则信息:
vim prometheus-config.yml
...
rule_files:
- /etc/prometheus/rules/*.yaml
...
kubectl replace -f prometheus-config.yml curl -X POST http://172.16.220.143:30090/-/reload
重新加载后,即可看到添加的规则信息:




It is a pity, that now I can not express – it is very occupied. I will return – I will necessarily express the opinion on this question.