{"id":1645,"date":"2023-09-20T15:22:41","date_gmt":"2023-09-20T15:22:41","guid":{"rendered":"https:\/\/www.aviator.co\/blog\/?p=1645"},"modified":"2025-09-25T13:49:29","modified_gmt":"2025-09-25T13:49:29","slug":"managing-prometheus-alerts-in-kubernetes-using-gitops","status":"publish","type":"post","link":"https:\/\/www.aviator.co\/blog\/managing-prometheus-alerts-in-kubernetes-using-gitops\/","title":{"rendered":"Managing Prometheus alerts in Kubernetes at scale using GitOps"},"content":{"rendered":"\n<figure class=\"wp-block-image aligncenter size-large\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"577\" src=\"https:\/\/www.aviator.co\/blog\/wp-content\/uploads\/2023\/09\/prometheus-alerts-1024x577.jpg\" alt=\"\" class=\"wp-image-1649\" srcset=\"https:\/\/www.aviator.co\/blog\/wp-content\/uploads\/2023\/09\/prometheus-alerts-1024x577.jpg 1024w, https:\/\/www.aviator.co\/blog\/wp-content\/uploads\/2023\/09\/prometheus-alerts-300x169.jpg 300w, https:\/\/www.aviator.co\/blog\/wp-content\/uploads\/2023\/09\/prometheus-alerts-768x432.jpg 768w, https:\/\/www.aviator.co\/blog\/wp-content\/uploads\/2023\/09\/prometheus-alerts-1536x865.jpg 1536w, https:\/\/www.aviator.co\/blog\/wp-content\/uploads\/2023\/09\/prometheus-alerts.jpg 1648w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Prometheus is a popular open-source monitoring and alerting solution. It is widely used in the Kubernetes ecosystem and is a part of the <a href=\"https:\/\/www.cncf.io\/\">Cloud Native Computing Foundation<\/a>.<\/p>\n\n\n\n<p>Prometheus has a powerful alerting mechanism that allows users to define alerts based on the metrics collected by Prometheus. The alerts can be configured to send notifications to various channels like Slack, PagerDuty, Email, etc.<\/p>\n\n\n\n<p>Managing Prometheus alerts can be a challenge in a large-scale Kubernetes environment as the number of alerts can grow. In this post, we will look at how to manage Prometheus alerts in a GitOps way using the <a href=\"https:\/\/github.com\/prometheus-operator\/prometheus-operator\">Prometheus Operator<\/a>, <a href=\"https:\/\/helm.sh\/docs\/helm\/helm_template\/\">Helm template<\/a>, and <a href=\"https:\/\/argo-cd.readthedocs.io\/en\/stable\/\">ArgoCD<\/a>.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/docs.aviator.co\/mergequeue\" target=\"_blank\" rel=\" noreferrer noopener\"><img decoding=\"async\" width=\"1024\" height=\"264\" src=\"https:\/\/www.aviator.co\/blog\/wp-content\/uploads\/2024\/04\/aviator-mergequeue-blog-documentation-cta-photo-min-1-1024x264.png\" alt=\"MergeQueue CTA\" class=\"wp-image-4921\" srcset=\"https:\/\/www.aviator.co\/blog\/wp-content\/uploads\/2024\/04\/aviator-mergequeue-blog-documentation-cta-photo-min-1-1024x264.png 1024w, https:\/\/www.aviator.co\/blog\/wp-content\/uploads\/2024\/04\/aviator-mergequeue-blog-documentation-cta-photo-min-1-300x77.png 300w, https:\/\/www.aviator.co\/blog\/wp-content\/uploads\/2024\/04\/aviator-mergequeue-blog-documentation-cta-photo-min-1-768x198.png 768w, https:\/\/www.aviator.co\/blog\/wp-content\/uploads\/2024\/04\/aviator-mergequeue-blog-documentation-cta-photo-min-1-1536x396.png 1536w, https:\/\/www.aviator.co\/blog\/wp-content\/uploads\/2024\/04\/aviator-mergequeue-blog-documentation-cta-photo-min-1.png 1940w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Prerequisites<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Kubernetes cluster<\/li>\n\n\n\n<li>Helm 3<\/li>\n\n\n\n<li>ArgoCD<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Prometheus Operator<\/h3>\n\n\n\n<p>The Prometheus Operator provides Kubernetes native deployment and management of Prometheus Alert Rules. Let\u2019s look at how to deploy Prometheus Operator using Helm.<\/p>\n\n\n\n<p>Let&#8217;s install the Prometheus Operator using Helm. We will install Kube-Prometheus-Stack which includes Prometheus Operator, Prometheus, Grafana, Alertmanager, and other metrics exporters.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>helm repo add prometheus-community https:\/\/prometheus-community.github.io\/helm-charts\nhelm install my-kube-prometheus-stack prometheus-community\/kube-prometheus-stack<\/code><\/pre>\n\n\n\n<p>This installs the CRD PrometheusRule which is used to define Prometheus alerts. Let\u2019s look at an example PrometheusRule.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>apiVersion: monitoring.coreos.com\/v1\nkind: PrometheusRule\nmetadata:\n  annotations:\n    meta.helm.sh\/release-name: kube-prometheus-stack\n  labels:\n    app.kubernetes.io\/instance: kube-prometheus-stack\n    app.kubernetes.io\/managed-by: Helm\n  name: kubernetes-apps\nspec:\n  groups:\n  - name: kubernetes-apps\n    rules:\n    - alert: KubePodCrashLooping\n      annotations:\n        description: 'Pod {{ $labels.namespace }}\/{{ $labels.pod }} ({{ $labels.container\n          }}) is in waiting state (reason: \"CrashLoopBackOff\").'\n        runbook_url: https:\/\/runbooks.prometheus-operator.dev\/runbooks\/kubernetes\/kubepodcrashlooping\n        summary: Pod is crash looping.\n      expr: max_over_time(kube_pod_container_status_waiting_reason{reason=\"CrashLoopBackOff\",\n        job=\"kube-state-metrics\", namespace=~\".*\"}&#91;5m]) &gt;= 1\n      for: 15m\n      labels:\n        severity: warning<\/code><\/pre>\n\n\n\n<p>The above PrometheusRule defines an alert KubePodCrashLooping which is triggered when a pod is in CrashLoopBackOff state for more than 15 minutes.<\/p>\n\n\n\n<p>Now, if you have a large number of alerts the file grows and is not very readable once you have more than 5 alerts. If you have multiple teams that have their own set of alerts, managing can be a challenge. Let\u2019s look at how to manage Prometheus alerts in a GitOps way using Helm template and ArgoCD.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Example<\/h4>\n\n\n\n<p>We have two teams that have their own set of alerts. One could have a single file with all the alerts defined in it, but readability takes a hit there and it would all be under a single resource.<\/p>\n\n\n\n<p>Instead, let&#8217;s create a helm template to parse the alerts defined in each team\u2019s folder and create a PrometheusRule resource for each of them.<\/p>\n\n\n\n<p>Below is the directory structure we&#8217;ll follow :<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\u251c\u2500\u2500 alert-rules\n\u2502   \u251c\u2500\u2500 Chart.yaml\n\u2502   \u251c\u2500\u2500 alert-rules\n\u2502   \u2502   \u251c\u2500\u2500 Team-A\n\u2502   \u2502   \u2502   \u2514\u2500\u2500 health_alerts.yaml\n\u2502   \u2502   \u2514\u2500\u2500 Team-B\n\u2502   \u2502       \u2514\u2500\u2500 latency_alerts.yaml\n\u2502   \u2514\u2500\u2500 values.yaml\n\u2502   \u2514\u2500\u2500 templates\n\u2502       \u2514\u2500\u2500 prometheusRule.yaml\n<\/code><\/pre>\n\n\n\n<p>Now let&#8217;s create the helm template <code>templates\/prometheusRule.yaml<\/code> and paste the below content.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>{{- \/*\nDefine a PrometheusRule object for each rule file.\n*\/ -}}\n{{- $ruleValues := .Values.ruleValues }}\n\n{{- \/*\nIterate over each rule file and create a PrometheusRule object with appropriate annotations and labels.\n*\/ -}}\n\n{{- range $ruleFolderPath := .Values.rulePaths }}\n  {{- range $path, $_ := trimSuffix \"\/\" $ruleFolderPath | printf \"%s\/**\/**.yaml\" | $.Files.Glob  }}\n    {{- $team := dir $path | base }}\n    {{- $ruleName := base $path | trimSuffix \".yaml\" | printf \"%s-%s\" $team | kebabcase}}\n    {{- if and (get $.Values.createRules $team) (base $path | ne \".defaults.yaml\") }}\n      {{- $template := $.Files.Get $path | fromYaml }}\n      {{- if not $template.rules }}\n        {{- cat \".rules is not defined in\" $path | fail }}\n      {{- end }}\n      {{- $defaults := dir $path | printf \"%s\/.defaults.yaml\" | $.Files.Get | fromYaml }}\n      {{- $defaultAnnotations := get $defaults \"annotations\" | default (dict) }}\n      {{- $defaultLabels := get $defaults \"labels\" | default (dict) }}\n---\napiVersion: monitoring.coreos.com\/v1\nkind: PrometheusRule\nmetadata:\n  name: {{ kebabcase $ruleName | quote }}\nspec:\n  groups:\n  - name: {{ camelcase $team | quote }}\n    rules:\n      {{- range $_, $rule := $template.rules }}\n      {{- $rule = mergeOverwrite (dict \"annotations\" (dict) \"labels\" (dict)) $rule }}\n      {{- $tplDict := dict \"Values\" $ruleValues \"Template\" $.Template \"Rule\" $rule }}\n      {{- $_ := tpl $rule.name $tplDict | set $rule \"name\" }}\n    - alert: {{ quote $rule.name }}\n      annotations:\n        {{- $annotations := merge $rule.annotations $defaultAnnotations }}\n        {{- range $key, $rawValue := $annotations }}\n          {{- $templatedValue := tpl $rawValue $tplDict }}\n          {{- with $templatedValue }}\n            {{- dict $key $templatedValue | toYaml | nindent 8 }}\n          {{- end }}\n        {{- end }}\n      expr: |-\n        {{- tpl $rule.expr $tplDict | nindent 8 }}\n      for: {{ $rule.for | default $ruleValues.defaults.for }}\n      labels:\n        {{- $labels := merge $rule.labels $defaultLabels }}\n        {{- range $key, $rawValue := $labels }}\n          {{- $templatedValue := tpl $rawValue $tplDict }}\n          {{- with $templatedValue }}\n            {{- dict $key $templatedValue | toYaml | nindent 8 }}\n          {{- end }}\n        {{- end }}\n      {{- end }}\n    {{- end }}\n  {{- end }}\n{{- end }}<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Understanding the template<\/h3>\n\n\n\n<p>Let&#8217;s go through the above template to understand how it works :<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li> <span style=\"font-size: revert; background-color: transparent; color: var(--theme-text-color); font-family: var(--fontFamily); font-style: var(--fontStyle, inherit); font-weight: var(--fontWeight); letter-spacing: var(--letterSpacing); text-transform: var(--textTransform);\">The template iterates over each directory defined in the <\/span><code style=\"font-size: revert; color: var(--theme-text-color); font-style: var(--fontStyle, inherit); font-weight: var(--fontWeight); letter-spacing: var(--letterSpacing); text-transform: var(--textTransform);\">rulePaths<\/code><span style=\"font-size: revert; background-color: transparent; color: var(--theme-text-color); font-family: var(--fontFamily); font-style: var(--fontStyle, inherit); font-weight: var(--fontWeight); letter-spacing: var(--letterSpacing); text-transform: var(--textTransform);\"> value and iterates over each <\/span><code style=\"font-size: revert; color: var(--theme-text-color); font-style: var(--fontStyle, inherit); font-weight: var(--fontWeight); letter-spacing: var(--letterSpacing); text-transform: var(--textTransform);\">.yaml<\/code><span style=\"font-size: revert; background-color: transparent; color: var(--theme-text-color); font-family: var(--fontFamily); font-style: var(--fontStyle, inherit); font-weight: var(--fontWeight); letter-spacing: var(--letterSpacing); text-transform: var(--textTransform);\"> file in the directory.<\/span> <pre><code class=\"language-go\">{{- range $ruleFolderPath := .Values.rulePaths }} {{- range $path, $_ := trimSuffix \"\/\" $ruleFolderPath | printf \"%s\/**\/**.yaml\" | $.Files.Glob  }}<\/code><\/pre> <\/li>\n\n\n\n<li> <span style=\"font-size: revert; background-color: transparent; color: var(--theme-text-color); font-family: var(--fontFamily); font-style: var(--fontStyle, inherit); font-weight: var(--fontWeight); letter-spacing: var(--letterSpacing); text-transform: var(--textTransform);\">We define the <\/span><code style=\"font-size: revert; color: var(--theme-text-color); font-style: var(--fontStyle, inherit); font-weight: var(--fontWeight); letter-spacing: var(--letterSpacing); text-transform: var(--textTransform);\">$team<\/code><span style=\"font-size: revert; background-color: transparent; color: var(--theme-text-color); font-family: var(--fontFamily); font-style: var(--fontStyle, inherit); font-weight: var(--fontWeight); letter-spacing: var(--letterSpacing); text-transform: var(--textTransform);\"> variable which is the name of the directory and <\/span><code style=\"font-size: revert; color: var(--theme-text-color); font-style: var(--fontStyle, inherit); font-weight: var(--fontWeight); letter-spacing: var(--letterSpacing); text-transform: var(--textTransform);\">ruleName<\/code><span style=\"font-size: revert; background-color: transparent; color: var(--theme-text-color); font-family: var(--fontFamily); font-style: var(--fontStyle, inherit); font-weight: var(--fontWeight); letter-spacing: var(--letterSpacing); text-transform: var(--textTransform);\"> variable which is the name of the rule file.<\/span> <pre><code class=\"language-go\">{{- $team := dir $path | base }} {{- $ruleName := base $path | trimSuffix \".yaml\" | printf \"%s-%s\" $team | kebabcase}}<\/code><\/pre> <\/li>\n\n\n\n<li>Next, we define the $template variable which is the content of the rule file and not iterated over <code>.defaults.yaml<\/code> file. From the last step, <code>$path<\/code> is the path of the rule file and then we use the <code>fromYaml<\/code> function to convert the content of the YAML to object so that we can iterate over it later.<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>{{- if (base $path | ne \".defaults.yaml\") }}\n  {{- $template := $.Files.Get $path | fromYaml }}<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li> <span style=\"font-size: revert; background-color: transparent; color: var(--theme-text-color); font-family: var(--fontFamily); font-style: var(--fontStyle, inherit); font-weight: var(--fontWeight); letter-spacing: var(--letterSpacing); text-transform: var(--textTransform);\">Then we assign the <\/span><code style=\"font-size: revert; color: var(--theme-text-color); font-style: var(--fontStyle, inherit); font-weight: var(--fontWeight); letter-spacing: var(--letterSpacing); text-transform: var(--textTransform);\">$defaults<\/code><span style=\"font-size: revert; background-color: transparent; color: var(--theme-text-color); font-family: var(--fontFamily); font-style: var(--fontStyle, inherit); font-weight: var(--fontWeight); letter-spacing: var(--letterSpacing); text-transform: var(--textTransform);\"> variable which is the content of the <\/span><code style=\"font-size: revert; color: var(--theme-text-color); font-style: var(--fontStyle, inherit); font-weight: var(--fontWeight); letter-spacing: var(--letterSpacing); text-transform: var(--textTransform);\">.defaults.yaml<\/code><span style=\"font-size: revert; background-color: transparent; color: var(--theme-text-color); font-family: var(--fontFamily); font-style: var(--fontStyle, inherit); font-weight: var(--fontWeight); letter-spacing: var(--letterSpacing); text-transform: var(--textTransform);\"> file and then create two variables <\/span><code style=\"font-size: revert; color: var(--theme-text-color); font-style: var(--fontStyle, inherit); font-weight: var(--fontWeight); letter-spacing: var(--letterSpacing); text-transform: var(--textTransform);\">$defaultAnnotations<\/code><span style=\"font-size: revert; background-color: transparent; color: var(--theme-text-color); font-family: var(--fontFamily); font-style: var(--fontStyle, inherit); font-weight: var(--fontWeight); letter-spacing: var(--letterSpacing); text-transform: var(--textTransform);\"> and <\/span><code style=\"font-size: revert; color: var(--theme-text-color); font-style: var(--fontStyle, inherit); font-weight: var(--fontWeight); letter-spacing: var(--letterSpacing); text-transform: var(--textTransform);\">$defaultLabels<\/code><span style=\"font-size: revert; background-color: transparent; color: var(--theme-text-color); font-family: var(--fontFamily); font-style: var(--fontStyle, inherit); font-weight: var(--fontWeight); letter-spacing: var(--letterSpacing); text-transform: var(--textTransform);\"> which are the annotations and labels defined in the <\/span><code style=\"font-size: revert; color: var(--theme-text-color); font-style: var(--fontStyle, inherit); font-weight: var(--fontWeight); letter-spacing: var(--letterSpacing); text-transform: var(--textTransform);\">.defaults.yaml<\/code><span style=\"font-size: revert; background-color: transparent; color: var(--theme-text-color); font-family: var(--fontFamily); font-style: var(--fontStyle, inherit); font-weight: var(--fontWeight); letter-spacing: var(--letterSpacing); text-transform: var(--textTransform);\"> file.<\/span> <\/li>\n\n\n\n<li> <span style=\"font-size: revert; background-color: transparent; color: var(--theme-text-color); font-family: var(--fontFamily); font-style: var(--fontStyle, inherit); font-weight: var(--fontWeight); letter-spacing: var(--letterSpacing); text-transform: var(--textTransform);\">Next, we iterate over each rule defined and create an empty dictionary for annotations and labels. This ensures that each alert has annotations and labels defined. Further, we create a template dictionary which is used to pass the values to the template.<\/span> <\/li>\n\n\n\n<li>The template dictionary has three keys <code>Values<\/code>, <code>Template<\/code>, and <code>Rule<\/code>. The <code>Values<\/code> key is used to pass the values defined in the <code>values.yaml<\/code> file. The <code>Template<\/code> key is used to pass the template object, which is the content of the rule file. This <code>Rule<\/code> is used to pass the rule object which is the content of the rule file.<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>{{- range $_, $rule := $template.rules }}\n{{- $rule = mergeOverwrite (dict \"annotations\" (dict) \"labels\" (dict)) $rule }}\n{{- $tplDict := dict \"Values\" $ruleValues \"Template\" $.Template \"Rule\" $rule }}\n{{- $_ := tpl $rule.name $tplDict | set $rule \"name\" }}<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li> <span style=\"font-size: revert; background-color: transparent; color: var(--theme-text-color); font-family: var(--fontFamily); font-style: var(--fontStyle, inherit); font-weight: var(--fontWeight); letter-spacing: var(--letterSpacing); text-transform: var(--textTransform);\">Then we define the annotations section of the alert, where we merge the annotations defined in each rule and the annotations defined in <\/span><code style=\"font-size: revert; color: var(--theme-text-color); font-style: var(--fontStyle, inherit); font-weight: var(--fontWeight); letter-spacing: var(--letterSpacing); text-transform: var(--textTransform);\">.defaults.yaml<\/code><span style=\"font-size: revert; background-color: transparent; color: var(--theme-text-color); font-family: var(--fontFamily); font-style: var(--fontStyle, inherit); font-weight: var(--fontWeight); letter-spacing: var(--letterSpacing); text-transform: var(--textTransform);\"> file.<\/span> <\/li>\n\n\n\n<li> <span style=\"background-color: rgba(58, 79, 102, 0.2); font-size: revert; color: var(--theme-text-color); font-family: var(--fontFamily); font-style: var(--fontStyle, inherit); font-weight: var(--fontWeight); letter-spacing: var(--letterSpacing); text-transform: var(--textTransform);\">Next, we iterate over each annotation and template the value using the <\/span><code style=\"background-color: rgba(58, 79, 102, 0.2); font-size: revert; color: var(--theme-text-color); font-style: var(--fontStyle, inherit); font-weight: var(--fontWeight); letter-spacing: var(--letterSpacing); text-transform: var(--textTransform);\">tpl<\/code><span style=\"background-color: rgba(58, 79, 102, 0.2); font-size: revert; color: var(--theme-text-color); font-family: var(--fontFamily); font-style: var(--fontStyle, inherit); font-weight: var(--fontWeight); letter-spacing: var(--letterSpacing); text-transform: var(--textTransform);\"> function, which takes the value and the template dictionary as input and returns the templated value. This is needed because we want to pass the values defined in the <\/span><code style=\"background-color: rgba(58, 79, 102, 0.2); font-size: revert; color: var(--theme-text-color); font-style: var(--fontStyle, inherit); font-weight: var(--fontWeight); letter-spacing: var(--letterSpacing); text-transform: var(--textTransform);\">values.yaml<\/code><span style=\"background-color: rgba(58, 79, 102, 0.2); font-size: revert; color: var(--theme-text-color); font-family: var(--fontFamily); font-style: var(--fontStyle, inherit); font-weight: var(--fontWeight); letter-spacing: var(--letterSpacing); text-transform: var(--textTransform);\"> file to the template. Once you&#8217;ll see the <\/span><code style=\"background-color: rgba(58, 79, 102, 0.2); font-size: revert; color: var(--theme-text-color); font-style: var(--fontStyle, inherit); font-weight: var(--fontWeight); letter-spacing: var(--letterSpacing); text-transform: var(--textTransform);\">values.yaml<\/code><span style=\"background-color: rgba(58, 79, 102, 0.2); font-size: revert; color: var(--theme-text-color); font-family: var(--fontFamily); font-style: var(--fontStyle, inherit); font-weight: var(--fontWeight); letter-spacing: var(--letterSpacing); text-transform: var(--textTransform);\">, it will be more clear.<\/span> <\/li>\n\n\n\n<li>Finally, we create a dictionary with the key as the annotation name and the value as the templated value and then convert it to YAML and indent it by 8 spaces.<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>{{- $annotations := merge $rule.annotations $defaultAnnotations }}\n{{- range $key, $rawValue := $annotations }}\n  {{- $templatedValue := tpl $rawValue $tplDict }}\n  {{- with $templatedValue }}\n    {{- dict $key $templatedValue | toYaml | nindent 8 }}\n  {{- end }}\n{{- end }}<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The same is done for other sections like <code>expr<\/code>, <code>for<\/code>, and <code>labels<\/code>.<\/li>\n<\/ul>\n\n\n\n<p>Let&#8217;s look at the <code>alert-rules\/values.yaml<\/code> file.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Path to directory with rules files of teams\nrulePaths:\n  - alert-rules\n\n# Enable \/ disable rules creation for teams\ncreateRules:\n  Team-A: true\n  Team-B: false\n\n# ruleValues that will be referred as .Values.ruleValues in rule definitions\nruleValues:\n  defaults:\n    for: 60s\n  severity:\n    critical: critical\n    warning: warning<\/code><\/pre>\n\n\n\n<p>It&#8217;s time to take a look at a sample alert definition file, let&#8217;s look at the <code>Team-A\/health_alerts.yaml<\/code> file.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>rules:\n  - name: KubeContainerWaiting\n    expr: sum by (namespace, pod, container, cluster) (kube_pod_container_status_waiting_reason{job=\"kube-state-metrics\",\n        namespace=~\"team-a\"}) &gt; 0\n    for: 1h\n    labels:\n      severity: '{{ .Values.severity.critical }}'\n    annotations:\n      description: 'Pod {{ \"{{$labels.namespace}}\" }}\/{{ \"{{$labels.pod}}\" }} has been in waiting state for more than 1 hour.'\n      summary: 'Pod container waiting longer than 1 hour.'<\/code><\/pre>\n\n\n\n<p>Since <code>rules<\/code> is a list, you can define multiple alerts in a single file. Suggested to separate alerts by category like health, latency, etc. in different files.<\/p>\n\n\n\n<p>Let&#8217;s also take a look at <code>Team-A\/.defaults.yaml<\/code> file.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>annotations:\n  summary: '{{ .Rule.name }}'\nlabels:\n  team: Team-A<\/code><\/pre>\n\n\n\n<p>Now that we have everything, let&#8217;s run Helm template and see what it generates.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>---\n# Source: alerts\/templates\/prometheusRule.yaml\napiVersion: monitoring.coreos.com\/v1\nkind: PrometheusRule\nmetadata:\n  name: \"team-a-health-alerts\"\n  labels:\n    app.kubernetes.io\/instance: alert-rules\n    app.kubernetes.io\/managed-by: Helm\nspec:\n  groups:\n  - name: \"TeamA\"\n    rules:\n    - alert: \"KubeContainerWaiting\"\n      annotations:\n        description: Pod {{$labels.namespace}}\/{{$labels.pod}} has been in waiting state for\n          more than 1 hour.\n        summary: Pod container waiting longer than 1 hour.\n      expr: |-\n        sum by (namespace, pod, container, cluster) (kube_pod_container_status_waiting_reason{job=\"kube-state-metrics\", namespace=~\"team-a\"}) &gt; 0\n      for: 1h\n      labels:\n        severity: critical\n        team: Team-A<\/code><\/pre>\n\n\n\n<p>Recall that I mentioned that we need the <code>tpl<\/code> function to pass the values defined in the <code>values.yaml<\/code> file to the template. If you see the above output, you&#8217;ll see that the <code>severity<\/code> value is replaced with the value defined in the <code>values.yaml<\/code> file.<\/p>\n\n\n\n<p>You can also see that <code>team<\/code> label is added to each alert where the value is defined in the <code>.defaults.yaml<\/code> file. It&#8217;s just an example, you can add any custom label that you want to see in each alert from that team for example environment, tenant, etc.<\/p>\n\n\n\n<p>Now, let&#8217;s deploy the alerts using ArgoCD by creating an application.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>apiVersion: argoproj.io\/v1alpha1\nkind: Application\nmetadata:\n  name: alert-rules-lab-cluster\n  namespace: argocd\nspec:\n  destination:\n    server: \"https:\/\/kubernetes.default.svc\"\n    namespace: monitoring\n  source:\n    path: alert-rules\n    repoURL: git@github.com:tanmay-bhat\/prometheus-alerts-demo.git\n    targetRevision: master\n    helm:\n      valueFiles:\n      - values.yaml<\/code><\/pre>\n\n\n\n<p>Once the application is deployed, you can see the Alerts managed in the Prometheus UI.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/github.com\/tanmay-bhat\/prometheus-alerts-demo\/blob\/main\/argocd-alerts-sync.png?raw=true\" alt=\"Prometheus Alerts\"\/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Benefits of managing alerts in a GitOps way<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Since each team has its own directory and rules file per category, its easy to manage once the organization grows.<\/li>\n\n\n\n<li>Readability is better as compared to a single file with all the alerts defined in it.<\/li>\n\n\n\n<li>Alerts cannot be deleted by mistake and cannot be manually edited, which ensures what you write is what you get.<\/li>\n\n\n\n<li>Bulk changes to alerts can be done easily by editing the template.<\/li>\n\n\n\n<li>Adding a new label, annotation, or even updating severity value for all alerts is flexible since its template-based.<\/li>\n\n\n\n<li>You can easily enable or disable alerts for a team per environment.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">References<\/h3>\n\n\n\n<p><a href=\"https:\/\/helm.sh\/docs\/chart_template_guide\/function_list\/\">https:\/\/helm.sh\/docs\/chart_template_guide\/function_list\/<\/a><br><a href=\"https:\/\/prometheus-operator.dev\/docs\/operator\/api\/#monitoring.coreos.com\/v1.PrometheusRule\"><\/a><a href=\"https:\/\/prometheus-operator.dev\/docs\/api-reference\/api\" title=\"\">https:\/\/prometheus-operator.dev\/docs\/api-reference\/api\/<br><\/a><a href=\"https:\/\/github.com\/tanmay-bhat\/prometheus-alerts-demo\">https:\/\/github.com\/tanmay-bhat\/prometheus-alerts-demo<\/a><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><a href=\"https:\/\/www.aviator.co\/\" target=\"_blank\" rel=\"noreferrer noopener\">Aviator<\/a>: Automate your cumbersome processes<\/h2>\n\n\n\n<figure class=\"wp-block-image\"><a href=\"https:\/\/www.aviator.co\/\" target=\"_blank\" rel=\"noreferrer noopener\"><img decoding=\"async\" width=\"1024\" height=\"727\" src=\"https:\/\/blog.aviator.co\/wp-content\/uploads\/2022\/08\/blog-cta-1024x727.png\" alt=\"\" class=\"wp-image-57\" srcset=\"https:\/\/www.aviator.co\/blog\/wp-content\/uploads\/2022\/08\/blog-cta-1024x727.png 1024w, https:\/\/www.aviator.co\/blog\/wp-content\/uploads\/2022\/08\/blog-cta-300x213.png 300w, https:\/\/www.aviator.co\/blog\/wp-content\/uploads\/2022\/08\/blog-cta-768x545.png 768w, https:\/\/www.aviator.co\/blog\/wp-content\/uploads\/2022\/08\/blog-cta-1536x1090.png 1536w, https:\/\/www.aviator.co\/blog\/wp-content\/uploads\/2022\/08\/blog-cta-2048x1454.png 2048w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<p>Aviator automates tedious developer workflows by managing git Pull Requests (PRs) and continuous integration test (CI) runs to help your team avoid broken builds, streamline cumbersome merge processes, manage cross-PR dependencies, and handle flaky tests while maintaining their security compliance.<\/p>\n\n\n\n<p>There are 4 key components to Aviator:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>MergeQueue<\/strong>&nbsp;\u2013 an automated queue that manages the merging workflow for your GitHub repository to help protect important branches from broken builds. The Aviator bot uses GitHub Labels to identify Pull Requests (PRs) that are ready to be merged, validates CI checks, processes semantic conflicts, and merges the PRs automatically.<\/li>\n\n\n\n<li><strong>ChangeSets<\/strong>&nbsp;\u2013 workflows to synchronize validating and merging multiple PRs within the same repository or multiple repositories. Useful when your team often sees groups of related PRs that need to be merged together, or otherwise treated as a single broader unit of change.<\/li>\n\n\n\n<li><strong>TestDeck<\/strong>&nbsp;\u2013 a tool to automatically detect, take action on, and process results from flaky tests in your CI infrastructure.<\/li>\n\n\n\n<li><strong>Stacked PRs CLI<\/strong>&nbsp;\u2013 a command line tool that helps developers manage cross-PR dependencies. This tool also automates syncing and merging of stacked PRs. Useful when your team wants to promote a culture of smaller, incremental PRs instead of large changes, or when your workflows involve keeping multiple, dependent PRs in sync.<\/li>\n<\/ol>\n\n\n\n<p><a href=\"https:\/\/www.aviator.co\/\" target=\"_blank\" rel=\"noopener\" title=\"\">Try it for free.<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this post, we will look at how to manage Prometheus alerts in a GitOps way using the Prometheus Operator, Helm template, and ArgoCD.<\/p>\n","protected":false},"author":11,"featured_media":1649,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"inline_featured_image":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[58],"tags":[59],"class_list":["post-1645","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ci-cd-deployment"],"blocksy_meta":{"styles_descriptor":{"styles":{"desktop":"","tablet":"","mobile":""},"google_fonts":[],"version":6}},"acf":[],"aioseo_notices":[],"jetpack_featured_media_url":"https:\/\/www.aviator.co\/blog\/wp-content\/uploads\/2023\/09\/prometheus-alerts.jpg","post_mailing_queue_ids":[],"_links":{"self":[{"href":"https:\/\/www.aviator.co\/blog\/wp-json\/wp\/v2\/posts\/1645","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.aviator.co\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aviator.co\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aviator.co\/blog\/wp-json\/wp\/v2\/users\/11"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aviator.co\/blog\/wp-json\/wp\/v2\/comments?post=1645"}],"version-history":[{"count":10,"href":"https:\/\/www.aviator.co\/blog\/wp-json\/wp\/v2\/posts\/1645\/revisions"}],"predecessor-version":[{"id":4959,"href":"https:\/\/www.aviator.co\/blog\/wp-json\/wp\/v2\/posts\/1645\/revisions\/4959"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aviator.co\/blog\/wp-json\/wp\/v2\/media\/1649"}],"wp:attachment":[{"href":"https:\/\/www.aviator.co\/blog\/wp-json\/wp\/v2\/media?parent=1645"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aviator.co\/blog\/wp-json\/wp\/v2\/categories?post=1645"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aviator.co\/blog\/wp-json\/wp\/v2\/tags?post=1645"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}