alerting/dashbaording: added minimal monitoring setup for the plugin#269
alerting/dashbaording: added minimal monitoring setup for the plugin#269christophrichtersap merged 4 commits intomainfrom
Conversation
📝 WalkthroughWalkthroughThis PR adds comprehensive monitoring and alerting infrastructure to the openstack-hypervisor-operator Helm chart, including Prometheus alert rule definitions, Grafana dashboard configuration, kube-state-metrics custom resource metrics, Helm templates for dynamic resource generation, and a GitHub Actions validation workflow. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (8)
charts/openstack-hypervisor-operator/dashboards/hypervisor-overview.json (1)
27-29: Consider making datasource name configurable.The datasource is hardcoded to
"name": "prometheus"throughout all panel queries. If environments use different datasource names, this dashboard won't work without modification. Consider using a variable for the datasource or documenting this requirement.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@charts/openstack-hypervisor-operator/dashboards/hypervisor-overview.json` around lines 27 - 29, The dashboard hardcodes the Prometheus datasource ("datasource" objects with "kind": "PrometheusDatasource" and "name": "prometheus"), which breaks in environments with a different datasource name; update the dashboard to use a configurable datasource variable (e.g., reference a dashboard variable like ${DS_PROMETHEUS} or $__datasource) for all panels and any datasource fields, add a template variable (e.g., DS_PROMETHEUS) defaulting to "prometheus", and replace every occurrence of the literal "prometheus" in the datasource.name fields so the dashboard will work across environments without manual edits.charts/openstack-hypervisor-operator/alerts/operator.yaml (2)
18-27: Clarify alert logic for HypervisorConditionDegraded.The expression
kube_customresource_hypervisor_condition{condition!="Ready"} == 0fires when non-Ready conditions (e.g., "Initialized", "Configured") have value 0 (False). This means the alert triggers when supplementary conditions are unhealthy.However, the alert name "Degraded" and value
== 0can be confusing. Consider:
- Adding a comment explaining the expected condition values (1=True, 0=False)
- Renaming to something more explicit like
HypervisorConditionUnhealthyThe current logic appears correct for detecting when conditions like "Initialized" or "Configured" become False, but the semantics should be documented.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@charts/openstack-hypervisor-operator/alerts/operator.yaml` around lines 18 - 27, The alert HypervisorConditionDegraded uses the metric kube_customresource_hypervisor_condition{condition!="Ready"} == 0 which checks for non-Ready conditions being False (0) versus True (1); update the rule by renaming the alert to HypervisorConditionUnhealthy (replace HypervisorConditionDegraded) and add a short YAML comment above the expr explaining the semantics "metric value: 1=True, 0=False; this fires when non-Ready conditions become False for >10m", and ensure the metric name kube_customresource_hypervisor_condition and label filtering {condition!="Ready"} remain unchanged so the logic still detects unhealthy supplementary conditions.
40-49: Consider adding a threshold for reconcile errors.The expression
rate(controller_runtime_reconcile_errors_total[5m]) > 0will trigger on any non-zero error rate. Even a single transient error that persists in the rate calculation for 15 minutes could fire this alert.Consider using a small threshold to filter out transient errors:
Proposed fix
- alert: HypervisorOperatorReconcileErrors expr: | - rate(controller_runtime_reconcile_errors_total[5m]) > 0 + rate(controller_runtime_reconcile_errors_total[5m]) > 0.01 for: 15m🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@charts/openstack-hypervisor-operator/alerts/operator.yaml` around lines 40 - 49, Update the HypervisorOperatorReconcileErrors alert to ignore single/transient errors by adding a small non-zero threshold to the expression: replace the current expr using rate(controller_runtime_reconcile_errors_total[5m]) > 0 with a thresholded check such as rate(controller_runtime_reconcile_errors_total[5m]) > 0.01 (or another small value you choose) so the alert only fires for sustained error rates; keep the alert name HypervisorOperatorReconcileErrors, the metric controller_runtime_reconcile_errors_total, and the existing for: 15m, labels and annotations unchanged.charts/openstack-hypervisor-operator/values.yaml (1)
75-77: Consider making the namespace configurable or documenting its purpose.The
namespace: kvm-monitoringis hardcoded, but it's unclear how this namespace relates to the deployment namespace. If this ConfigMap should be created in a different namespace than the release, ensure the template applies this namespace correctly. Otherwise, consider documenting why this specific namespace is required or making it default to.Release.Namespace.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@charts/openstack-hypervisor-operator/values.yaml` around lines 75 - 77, The values.yaml hardcodes customResourceMetrics.namespace to "kvm-monitoring"; update the chart so this value is configurable and default it to the release namespace (use .Release.Namespace) or document why "kvm-monitoring" is required; specifically adjust the values entry customResourceMetrics.namespace and ensure the templates that create the ConfigMap/Role/ServiceAccount reference .Values.customResourceMetrics.namespace (or fall back to .Release.Namespace) so the resource is created in the intended namespace, or add a clear comment in values.yaml explaining the special-purpose "kvm-monitoring" namespace if you intend to keep it fixed.charts/openstack-hypervisor-operator/templates/dashboards.yaml (2)
11-11: AddtrimSuffix "-"after truncation for consistent naming.The name generation truncates at 63 characters but doesn't trim trailing hyphens, which could result in names ending with
-if truncation occurs mid-word. This is inconsistent with other name generations in the chart (seecustom-resource-metrics.yamlline 8).Proposed fix
- name: {{ printf "%s-%s" $root.Release.Name ($path | replace ".json" "" | replace "/" "-" | trunc 63) }} + name: {{ printf "%s-%s" $root.Release.Name ($path | replace ".json" "" | replace "/" "-") | trunc 63 | trimSuffix "-" }}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@charts/openstack-hypervisor-operator/templates/dashboards.yaml` at line 11, The name template builds a release-based name using ($path | replace ".json" "" | replace "/" "-" | trunc 63) but may leave a trailing hyphen when truncation cuts mid-segment; update the expression to trim trailing hyphens by piping the truncated result through trimSuffix "-" (i.e., apply trimSuffix "-" after trunc 63) so the generated name using $root.Release.Name and $path never ends with a trailing "-" and matches other templates like custom-resource-metrics.yaml.
28-28: Same trimSuffix issue for global dashboard names.Apply the same fix for consistency.
Proposed fix
- name: {{ printf "%s-%s-global" $root.Release.Name ($path | replace ".json" "" | replace "/" "-" | trunc 63) }} + name: {{ printf "%s-%s-global" $root.Release.Name ($path | replace ".json" "" | replace "/" "-") | trunc 63 | trimSuffix "-" }}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@charts/openstack-hypervisor-operator/templates/dashboards.yaml` at line 28, The global dashboard name construction uses replace to strip ".json" from $path which can mis-handle substrings; change the expression in the name template so it uses trimSuffix to remove the ".json" suffix before replacing slashes and truncating: replace the subexpression "($path | replace \".json\" \"\" | replace \"/\" \"-\" | trunc 63)" with "($path | trimSuffix \".json\" | replace \"/\" \"-\" | trunc 63)" so the template building the name (printf "%s-%s-global" $root.Release.Name ...) uses trimSuffix correctly.charts/openstack-hypervisor-operator/templates/alerts.yaml (1)
7-15: Regex-based label injection may be fragile.The regex
(?m)^(\\s+labels:\\s*\\n)assumes a specific YAML structure wherelabels:appears on its own line with only whitespace before and a newline after. This could fail if:
- An alert rule has no
labels:section- The
labels:key has inline content (e.g.,labels: {})- YAML uses different indentation styles
Consider documenting this requirement for alert file authors or adding a fallback mechanism.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@charts/openstack-hypervisor-operator/templates/alerts.yaml` around lines 7 - 15, The current regex in $labelsLineRegex and the regexReplaceAll usage can fail for alerts with no labels:, inline labels (labels: {}), or different indentation; update the logic that builds $alertContent to (1) detect inline label declarations (match labels:\s*\{.*\} and replace/expand them into a block), (2) if no labels: block exists, insert a properly-indented labels: block (using $additionalRuleLabels) at a safe insertion point such as after the top-level rule metadata or after annotations, and (3) preserve indentation by deriving the surrounding block indent rather than assuming a fixed whitespace capture; also add a short comment documenting the expected alert file shape for authors. Ensure the changes reference $labelsLineRegex, regexReplaceAll, $alertContent and $additionalRuleLabels so reviewers can find and update the right spot..github/workflows/validate-prometheus-alerts.yaml (1)
6-11: Consider adding workflow self-trigger path.The workflow doesn't trigger when the workflow file itself is modified. This could lead to CI issues going undetected when the workflow is updated.
Proposed fix
on: workflow_dispatch: pull_request: paths: - 'charts/openstack-hypervisor-operator/alerts/*.yaml' - 'charts/openstack-hypervisor-operator/alerts/*.yml' + - '.github/workflows/validate-prometheus-alerts.yaml'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/validate-prometheus-alerts.yaml around lines 6 - 11, Update the workflow's on: pull_request paths so changes to the workflow file itself trigger runs: in the existing on: block that defines workflow_dispatch and pull_request paths, add the workflow YAML filename to the paths list (i.e., include the current workflow file name) so edits to the workflow file will trigger the validation job; modify the paths array in the same file where the pull_request paths are declared.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@charts/openstack-hypervisor-operator/templates/_helpers.tpl`:
- Around line 100-107: The template helper
"openstack-hypervisor-operator.globalDashboardSelectorLabels" references
.Values.dashboards.global.dashboardSelectors which is not defined in
values.yaml, so either add a default array for
dashboards.global.dashboardSelectors to values.yaml or change the helper to
handle missing selectors; update values.yaml to include
dashboards.global.dashboardSelectors: [] (or a documented default with
name/value pairs) or modify the helper (globalDashboardSelectorLabels) to guard
with hasKey/required/default and skip rendering when selectors are absent to
ensure global ConfigMaps get correct selector labels only when provided.
In `@charts/openstack-hypervisor-operator/templates/custom-resource-metrics.yaml`:
- Around line 4-11: The ConfigMap in templates/custom-resource-metrics.yaml
ignores the values.customResourceMetrics.namespace setting; either remove that
unused key from values.yaml or update the ConfigMap metadata to set namespace:
{{ .Values.customResourceMetrics.namespace }} so the ConfigMap is created in the
intended namespace—modify the metadata block in this template (the ConfigMap
with name generated by printf "%s-hypervisor-resources") to reference
.Values.customResourceMetrics.namespace if you choose to apply the namespace.
---
Nitpick comments:
In @.github/workflows/validate-prometheus-alerts.yaml:
- Around line 6-11: Update the workflow's on: pull_request paths so changes to
the workflow file itself trigger runs: in the existing on: block that defines
workflow_dispatch and pull_request paths, add the workflow YAML filename to the
paths list (i.e., include the current workflow file name) so edits to the
workflow file will trigger the validation job; modify the paths array in the
same file where the pull_request paths are declared.
In `@charts/openstack-hypervisor-operator/alerts/operator.yaml`:
- Around line 18-27: The alert HypervisorConditionDegraded uses the metric
kube_customresource_hypervisor_condition{condition!="Ready"} == 0 which checks
for non-Ready conditions being False (0) versus True (1); update the rule by
renaming the alert to HypervisorConditionUnhealthy (replace
HypervisorConditionDegraded) and add a short YAML comment above the expr
explaining the semantics "metric value: 1=True, 0=False; this fires when
non-Ready conditions become False for >10m", and ensure the metric name
kube_customresource_hypervisor_condition and label filtering
{condition!="Ready"} remain unchanged so the logic still detects unhealthy
supplementary conditions.
- Around line 40-49: Update the HypervisorOperatorReconcileErrors alert to
ignore single/transient errors by adding a small non-zero threshold to the
expression: replace the current expr using
rate(controller_runtime_reconcile_errors_total[5m]) > 0 with a thresholded check
such as rate(controller_runtime_reconcile_errors_total[5m]) > 0.01 (or another
small value you choose) so the alert only fires for sustained error rates; keep
the alert name HypervisorOperatorReconcileErrors, the metric
controller_runtime_reconcile_errors_total, and the existing for: 15m, labels and
annotations unchanged.
In `@charts/openstack-hypervisor-operator/dashboards/hypervisor-overview.json`:
- Around line 27-29: The dashboard hardcodes the Prometheus datasource
("datasource" objects with "kind": "PrometheusDatasource" and "name":
"prometheus"), which breaks in environments with a different datasource name;
update the dashboard to use a configurable datasource variable (e.g., reference
a dashboard variable like ${DS_PROMETHEUS} or $__datasource) for all panels and
any datasource fields, add a template variable (e.g., DS_PROMETHEUS) defaulting
to "prometheus", and replace every occurrence of the literal "prometheus" in the
datasource.name fields so the dashboard will work across environments without
manual edits.
In `@charts/openstack-hypervisor-operator/templates/alerts.yaml`:
- Around line 7-15: The current regex in $labelsLineRegex and the
regexReplaceAll usage can fail for alerts with no labels:, inline labels
(labels: {}), or different indentation; update the logic that builds
$alertContent to (1) detect inline label declarations (match labels:\s*\{.*\}
and replace/expand them into a block), (2) if no labels: block exists, insert a
properly-indented labels: block (using $additionalRuleLabels) at a safe
insertion point such as after the top-level rule metadata or after annotations,
and (3) preserve indentation by deriving the surrounding block indent rather
than assuming a fixed whitespace capture; also add a short comment documenting
the expected alert file shape for authors. Ensure the changes reference
$labelsLineRegex, regexReplaceAll, $alertContent and $additionalRuleLabels so
reviewers can find and update the right spot.
In `@charts/openstack-hypervisor-operator/templates/dashboards.yaml`:
- Line 11: The name template builds a release-based name using ($path | replace
".json" "" | replace "/" "-" | trunc 63) but may leave a trailing hyphen when
truncation cuts mid-segment; update the expression to trim trailing hyphens by
piping the truncated result through trimSuffix "-" (i.e., apply trimSuffix "-"
after trunc 63) so the generated name using $root.Release.Name and $path never
ends with a trailing "-" and matches other templates like
custom-resource-metrics.yaml.
- Line 28: The global dashboard name construction uses replace to strip ".json"
from $path which can mis-handle substrings; change the expression in the name
template so it uses trimSuffix to remove the ".json" suffix before replacing
slashes and truncating: replace the subexpression "($path | replace \".json\"
\"\" | replace \"/\" \"-\" | trunc 63)" with "($path | trimSuffix \".json\" |
replace \"/\" \"-\" | trunc 63)" so the template building the name (printf
"%s-%s-global" $root.Release.Name ...) uses trimSuffix correctly.
In `@charts/openstack-hypervisor-operator/values.yaml`:
- Around line 75-77: The values.yaml hardcodes customResourceMetrics.namespace
to "kvm-monitoring"; update the chart so this value is configurable and default
it to the release namespace (use .Release.Namespace) or document why
"kvm-monitoring" is required; specifically adjust the values entry
customResourceMetrics.namespace and ensure the templates that create the
ConfigMap/Role/ServiceAccount reference .Values.customResourceMetrics.namespace
(or fall back to .Release.Namespace) so the resource is created in the intended
namespace, or add a clear comment in values.yaml explaining the special-purpose
"kvm-monitoring" namespace if you intend to keep it fixed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: cc5b7867-a979-48bf-8c19-be9bdec10aa3
📒 Files selected for processing (8)
.github/workflows/validate-prometheus-alerts.yamlcharts/openstack-hypervisor-operator/alerts/operator.yamlcharts/openstack-hypervisor-operator/dashboards/hypervisor-overview.jsoncharts/openstack-hypervisor-operator/templates/_helpers.tplcharts/openstack-hypervisor-operator/templates/alerts.yamlcharts/openstack-hypervisor-operator/templates/custom-resource-metrics.yamlcharts/openstack-hypervisor-operator/templates/dashboards.yamlcharts/openstack-hypervisor-operator/values.yaml
| {{- define "openstack-hypervisor-operator.globalDashboardSelectorLabels" -}} | ||
| {{- $root := index . 1 -}} | ||
| {{- with $root.Values.dashboards.global.dashboardSelectors }} | ||
| {{- range $i, $target := . }} | ||
| {{ $target.name | required (printf "$.Values.dashboards.global.dashboardSelectors[%v].name missing" $i) }}: {{ tpl ($target.value | required (printf "$.Values.dashboards.global.dashboardSelectors[%v].value missing" $i)) $root }} | ||
| {{- end }} | ||
| {{- end }} | ||
| {{- end -}} |
There was a problem hiding this comment.
Missing dashboards.global.dashboardSelectors in values.yaml.
The globalDashboardSelectorLabels helper references .Values.dashboards.global.dashboardSelectors, but this value is not defined in values.yaml. Only dashboards.global.create exists. When dashboards.global.create is true, the global ConfigMaps will have no selector labels.
Either add the missing value definition or document that global dashboards intentionally have no selectors.
Add to values.yaml under dashboards.global
dashboards:
create: true
global:
create: false
+ dashboardSelectors: []
dashboardSelectors:
- name: perses.dev/resource
value: '"true"'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@charts/openstack-hypervisor-operator/templates/_helpers.tpl` around lines 100
- 107, The template helper
"openstack-hypervisor-operator.globalDashboardSelectorLabels" references
.Values.dashboards.global.dashboardSelectors which is not defined in
values.yaml, so either add a default array for
dashboards.global.dashboardSelectors to values.yaml or change the helper to
handle missing selectors; update values.yaml to include
dashboards.global.dashboardSelectors: [] (or a documented default with
name/value pairs) or modify the helper (globalDashboardSelectorLabels) to guard
with hasKey/required/default and skip rendering when selectors are absent to
ensure global ConfigMaps get correct selector labels only when provided.
| {{- if .Values.customResourceMetrics.create }} | ||
| apiVersion: v1 | ||
| kind: ConfigMap | ||
| metadata: | ||
| name: {{ printf "%s-hypervisor-resources" .Release.Name | trunc 63 | trimSuffix "-" }} | ||
| labels: | ||
| dev.custom.kube-state-metrics: "true" | ||
| {{ include "openstack-hypervisor-operator.labels" . | indent 4 }} |
There was a problem hiding this comment.
The customResourceMetrics.namespace value is not used.
The ConfigMap doesn't specify a namespace in its metadata, so it will be created in the release namespace. However, values.yaml defines customResourceMetrics.namespace: kvm-monitoring which appears to be unused. Either:
- Remove the unused
namespacevalue from values.yaml, or - Apply it to the ConfigMap metadata
Option to apply namespace
metadata:
name: {{ printf "%s-hypervisor-resources" .Release.Name | trunc 63 | trimSuffix "-" }}
+ namespace: {{ .Values.customResourceMetrics.namespace | default .Release.Namespace }}
labels:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| {{- if .Values.customResourceMetrics.create }} | |
| apiVersion: v1 | |
| kind: ConfigMap | |
| metadata: | |
| name: {{ printf "%s-hypervisor-resources" .Release.Name | trunc 63 | trimSuffix "-" }} | |
| labels: | |
| dev.custom.kube-state-metrics: "true" | |
| {{ include "openstack-hypervisor-operator.labels" . | indent 4 }} | |
| {{- if .Values.customResourceMetrics.create }} | |
| apiVersion: v1 | |
| kind: ConfigMap | |
| metadata: | |
| name: {{ printf "%s-hypervisor-resources" .Release.Name | trunc 63 | trimSuffix "-" }} | |
| namespace: {{ .Values.customResourceMetrics.namespace | default .Release.Namespace }} | |
| labels: | |
| dev.custom.kube-state-metrics: "true" | |
| {{ include "openstack-hypervisor-operator.labels" . | indent 4 }} |
🧰 Tools
🪛 YAMLlint (1.38.0)
[error] 4-4: syntax error: expected the node content, but found '-'
(syntax)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@charts/openstack-hypervisor-operator/templates/custom-resource-metrics.yaml`
around lines 4 - 11, The ConfigMap in templates/custom-resource-metrics.yaml
ignores the values.customResourceMetrics.namespace setting; either remove that
unused key from values.yaml or update the ConfigMap metadata to set namespace:
{{ .Values.customResourceMetrics.namespace }} so the ConfigMap is created in the
intended namespace—modify the metadata block in this template (the ConfigMap
with name generated by printf "%s-hypervisor-resources") to reference
.Values.customResourceMetrics.namespace if you choose to apply the namespace.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@charts/openstack-hypervisor-operator/values.yaml`:
- Around line 60-61: The default-enabled PrometheusRule is unsafe; change the
prometheusRules.create default in values.yaml from true to false to avoid
installing PrometheusRule when the monitoring CRD is absent, or alternatively
add a guard in the alerts.yaml template that wraps the PrometheusRule rendering
with .Capabilities.APIVersions.Has "monitoring.coreos.com/v1" so the resource is
only emitted when the CRD exists; update the prometheusRules.create value
(and/or add the .Capabilities.APIVersions.Has check in alerts.yaml) to fix the
installation failure.
- Around line 72-73: The selector value for the label key perses.dev/resource
contains embedded quote characters ('"true"') which will be treated as part of
the string; update the value to an unquoted or properly quoted YAML scalar
(e.g., value: "true" or value: true) so the selector matches other
resources—locate the block defining name: perses.dev/resource and replace the
current value with an unquoted/normal quoted true.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4686aef8-20e4-466f-a02f-3cba5539eac0
📒 Files selected for processing (1)
charts/openstack-hypervisor-operator/values.yaml
| prometheusRules: | ||
| create: true |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Expectation:
# - Prefer to find a CRD presence guard like `.Capabilities.APIVersions.Has`.
# - If no guard exists and default is true, the risk is confirmed.
rg -n -C3 'kind:\s*PrometheusRule|Capabilities\.APIVersions\.Has|monitoring\.coreos\.com/v1' charts/openstack-hypervisor-operator/templates
rg -n -C3 'dependencies:|kube-prometheus|prometheus-operator|monitoring' charts/openstack-hypervisor-operator/Chart.yamlRepository: cobaltcore-dev/openstack-hypervisor-operator
Length of output: 805
🏁 Script executed:
# Check if alerts.yaml has conditional rendering based on prometheusRules.create
cat -n charts/openstack-hypervisor-operator/templates/alerts.yaml | head -30Repository: cobaltcore-dev/openstack-hypervisor-operator
Length of output: 1529
🏁 Script executed:
# Check Chart.yaml for dependencies
cat charts/openstack-hypervisor-operator/Chart.yamlRepository: cobaltcore-dev/openstack-hypervisor-operator
Length of output: 230
🏁 Script executed:
# Search for how prometheusRules.create is used in templates
rg -n 'prometheusRules.*create|if.*prometheusRules' charts/openstack-hypervisor-operator/templatesRepository: cobaltcore-dev/openstack-hypervisor-operator
Length of output: 190
🏁 Script executed:
# Check for any Capabilities.APIVersions.Has guards
rg -n 'Capabilities\.APIVersions' charts/openstack-hypervisor-operator/templatesRepository: cobaltcore-dev/openstack-hypervisor-operator
Length of output: 70
🏁 Script executed:
# Check the full alerts.yaml file to see complete conditional structure
cat -n charts/openstack-hypervisor-operator/templates/alerts.yaml | tail -20Repository: cobaltcore-dev/openstack-hypervisor-operator
Length of output: 896
Default-enabled PrometheusRule can break installs without monitoring CRDs.
The template at line 4 of alerts.yaml conditionally renders the PrometheusRule resource based on prometheusRules.create, but defaults to true without any CRD presence check or dependency contract. Clusters lacking the monitoring.coreos.com/v1 CRD will fail during install/upgrade. Either:
- Default
createtofalse(safest), or - Add a CRD presence guard with
.Capabilities.APIVersions.Has "monitoring.coreos.com/v1", or - Declare an explicit dependency guaranteeing the CRD exists
Recommended: Change line 61 to create: false.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@charts/openstack-hypervisor-operator/values.yaml` around lines 60 - 61, The
default-enabled PrometheusRule is unsafe; change the prometheusRules.create
default in values.yaml from true to false to avoid installing PrometheusRule
when the monitoring CRD is absent, or alternatively add a guard in the
alerts.yaml template that wraps the PrometheusRule rendering with
.Capabilities.APIVersions.Has "monitoring.coreos.com/v1" so the resource is only
emitted when the CRD exists; update the prometheusRules.create value (and/or add
the .Capabilities.APIVersions.Has check in alerts.yaml) to fix the installation
failure.
| - name: perses.dev/resource | ||
| value: '"true"' |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Expectation:
# - If helper already uses `| quote`, embedded quotes in values.yaml are problematic.
# - If helper does not quote, plain string is still cleaner and less error-prone.
rg -n -C3 'dashboardSelectors|dashboardSelectorLabels|globalDashboardSelectorLabels|\.value|quote' \
charts/openstack-hypervisor-operator/templates/_helpers.tpl \
charts/openstack-hypervisor-operator/templates/dashboards.yamlRepository: cobaltcore-dev/openstack-hypervisor-operator
Length of output: 5793
🏁 Script executed:
sed -n '65,80p' charts/openstack-hypervisor-operator/values.yamlRepository: cobaltcore-dev/openstack-hypervisor-operator
Length of output: 285
Remove embedded quotes from selector value.
The value '"true"' contains literal quote characters that will be rendered as part of the selector value, causing mismatches with selectors expecting an unquoted string. Use value: "true" instead.
Suggested cleanup diff
dashboardSelectors:
- name: perses.dev/resource
- value: '"true"'
+ value: "true"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - name: perses.dev/resource | |
| value: '"true"' | |
| - name: perses.dev/resource | |
| value: "true" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@charts/openstack-hypervisor-operator/values.yaml` around lines 72 - 73, The
selector value for the label key perses.dev/resource contains embedded quote
characters ('"true"') which will be treated as part of the string; update the
value to an unquoted or properly quoted YAML scalar (e.g., value: "true" or
value: true) so the selector matches other resources—locate the block defining
name: perses.dev/resource and replace the current value with an unquoted/normal
quoted true.
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
charts/openstack-hypervisor-operator/alerts/operator.yaml (1)
85-94: Consider a more specific job selector for the operator down alert.The regex
job=~".*hypervisor-operator.*"is quite permissive and could match unintended jobs if naming conventions aren't strict. A more constrained pattern or an exact match would reduce false positives.Example: tighter regex or parameterized job name
- alert: HypervisorOperatorDown expr: | - up{job=~".*hypervisor-operator.*"} == 0 + up{job=~"openstack-hypervisor-operator(-metrics)?"} == 0 for: 5mAlternatively, parameterize the job name via a templated label or use the exact job name from your ServiceMonitor configuration.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@charts/openstack-hypervisor-operator/alerts/operator.yaml` around lines 85 - 94, The HypervisorOperatorDown alert uses a permissive job selector (expr: up{job=~".*hypervisor-operator.*"}) which may match unintended targets; update the selector in the HypervisorOperatorDown alert to a tighter pattern or an exact match (e.g., job="hypervisor-operator" or a more specific regex) or make the job name configurable via a templated value so it aligns with the ServiceMonitor/Prometheus scrape job name and avoids false positives.charts/openstack-hypervisor-operator/values.yaml (1)
72-73: Consider defaultingdashboards.createtofalsefor consistency.Similar to the PrometheusRules concern, defaulting
dashboards.create: trueassumes the dashboard sidecar or operator exists in the cluster. Users without a dashboard provisioner would create orphaned ConfigMaps.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@charts/openstack-hypervisor-operator/values.yaml` around lines 72 - 73, The default value for dashboards.create should be changed from true to false to avoid creating orphaned ConfigMaps when no dashboard sidecar/operator is present; update the values.yaml entry for the dashboards.create key so it defaults to false (and ensure any chart documentation or README that references dashboards.create reflects this default change), leaving the rest of the dashboards block and behavior unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@charts/openstack-hypervisor-operator/templates/alerts.yaml`:
- Around line 20-23: The merge currently uses merge (deepCopy
$additionalRuleLabels) $rule.labels which causes keys from $rule.labels to
overwrite $additionalRuleLabels; if you intend additionalRuleLabels to take
precedence, reverse the merge order to merge (deepCopy $rule.labels)
$additionalRuleLabels so $additionalRuleLabels wins, and update the assignment
that sets $mergedLabels into $rule.labels; if the current precedence (rule
labels override additional labels) is intentional, add a concise comment near
$additionalRuleLabels/$mergedLabels explaining that behavior.
- Around line 33-37: The variable $docStarted is being set inside the template
loop but its assignment doesn't persist across iterations; update the template
logic to ensure separators are emitted correctly by either tracking state in an
outer-scoped list variable (e.g., create and append to a slice like
$docStartedList and check its length before emitting ---) or simpler: always
emit the document separator before each document (remove the conditional around
--- and rely on Kubernetes tolerating leading separators); make changes around
the use of $docStarted and $filteredGroups to implement one of these approaches.
In `@charts/openstack-hypervisor-operator/templates/dashboards.yaml`:
- Around line 21-35: The data key for global dashboards is indented incorrectly
due to using indent inside the printf for the key (the line constructing the map
key: the printf "%s: |-" using ($path | replace "/" "-" | indent 2)), which
causes YAML parsing/format issues; change it to emit the key without indenting
the string (remove the indent filter on ($path | replace "/" "-")) and keep the
dashboard JSON bytes indented (the printf "%s" $bytes | indent 4) so the
ConfigMap data field is correctly formed; apply this same fix within the global
block that uses .Values.dashboards.global.create, $root and the range over
.Files.Glob "dashboards/*.json".
---
Nitpick comments:
In `@charts/openstack-hypervisor-operator/alerts/operator.yaml`:
- Around line 85-94: The HypervisorOperatorDown alert uses a permissive job
selector (expr: up{job=~".*hypervisor-operator.*"}) which may match unintended
targets; update the selector in the HypervisorOperatorDown alert to a tighter
pattern or an exact match (e.g., job="hypervisor-operator" or a more specific
regex) or make the job name configurable via a templated value so it aligns with
the ServiceMonitor/Prometheus scrape job name and avoids false positives.
In `@charts/openstack-hypervisor-operator/values.yaml`:
- Around line 72-73: The default value for dashboards.create should be changed
from true to false to avoid creating orphaned ConfigMaps when no dashboard
sidecar/operator is present; update the values.yaml entry for the
dashboards.create key so it defaults to false (and ensure any chart
documentation or README that references dashboards.create reflects this default
change), leaving the rest of the dashboards block and behavior unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 81e9e72b-f5b9-4fe7-af75-aa412e6df5a2
📒 Files selected for processing (4)
charts/openstack-hypervisor-operator/alerts/operator.yamlcharts/openstack-hypervisor-operator/templates/alerts.yamlcharts/openstack-hypervisor-operator/templates/dashboards.yamlcharts/openstack-hypervisor-operator/values.yaml
| {{- if $additionalRuleLabels }} | ||
| {{- $mergedLabels := merge (deepCopy $additionalRuleLabels) $rule.labels }} | ||
| {{- $_ := set $rule "labels" $mergedLabels }} | ||
| {{- end }} |
There was a problem hiding this comment.
Merge order may not give additionalRuleLabels the expected priority.
Sprig's merge function copies keys from the second argument into the first, meaning values in $rule.labels will overwrite matching keys in $additionalRuleLabels. If the intent is for additionalRuleLabels to override rule-defined labels, reverse the order.
Fix if additionalRuleLabels should take precedence
{{- if $additionalRuleLabels }}
- {{- $mergedLabels := merge (deepCopy $additionalRuleLabels) $rule.labels }}
+ {{- $mergedLabels := merge (deepCopy $rule.labels) $additionalRuleLabels }}
{{- $_ := set $rule "labels" $mergedLabels }}
{{- end }}If the current behavior (rule labels override additional labels) is intentional, please add a comment clarifying this.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@charts/openstack-hypervisor-operator/templates/alerts.yaml` around lines 20 -
23, The merge currently uses merge (deepCopy $additionalRuleLabels) $rule.labels
which causes keys from $rule.labels to overwrite $additionalRuleLabels; if you
intend additionalRuleLabels to take precedence, reverse the merge order to merge
(deepCopy $rule.labels) $additionalRuleLabels so $additionalRuleLabels wins, and
update the assignment that sets $mergedLabels into $rule.labels; if the current
precedence (rule labels override additional labels) is intentional, add a
concise comment near $additionalRuleLabels/$mergedLabels explaining that
behavior.
| {{- if $filteredGroups }} | ||
| {{- if $docStarted }} | ||
| --- | ||
| {{- end }} | ||
| {{- $docStarted = true }} |
There was a problem hiding this comment.
Variable scoping issue: $docStarted won't persist across loop iterations.
In Helm/Go templates, variable assignments inside range blocks don't propagate to outer scopes. The $docStarted = true on line 37 is scoped to the current iteration, so subsequent iterations will still see $docStarted as false, causing missing --- separators between documents.
Suggested fix using a list to track state
-{{- $docStarted := false }}
+{{- $docs := list }}
{{- range $alertPath, $alertRaw := $root.Files.Glob "alerts/*.yaml" }}
...
{{- if $filteredGroups }}
-{{- if $docStarted }}
+{{- if $docs }}
---
{{- end }}
-{{- $docStarted = true }}
+{{- $docs = append $docs "x" }}
apiVersion: monitoring.coreos.com/v1Alternatively, always emit --- before each document (Kubernetes handles leading separators gracefully).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@charts/openstack-hypervisor-operator/templates/alerts.yaml` around lines 33 -
37, The variable $docStarted is being set inside the template loop but its
assignment doesn't persist across iterations; update the template logic to
ensure separators are emitted correctly by either tracking state in an
outer-scoped list variable (e.g., create and append to a slice like
$docStartedList and check its length before emitting ---) or simpler: always
emit the document separator before each document (remove the conditional around
--- and rely on Kubernetes tolerating leading separators); make changes around
the use of $docStarted and $filteredGroups to implement one of these approaches.
| {{- if .Values.dashboards.global.create }} | ||
| {{ $root := . }} | ||
| {{- range $path, $bytes := .Files.Glob "dashboards/*.json" }} | ||
| --- | ||
| apiVersion: v1 | ||
| kind: ConfigMap | ||
| metadata: | ||
| name: {{ printf "%s-%s-global" $root.Release.Name ($path | replace ".json" "" | replace "/" "-") | trunc 63 | trimSuffix "-" }} | ||
| labels: | ||
| {{- include "openstack-hypervisor-operator.globalDashboardSelectorLabels" (list $path $root) | indent 4 }} | ||
| {{ include "openstack-hypervisor-operator.monitoringLabels" (list $path $root) | indent 4 }} | ||
| data: | ||
| {{ printf "%s: |-" ($path | replace "/" "-" | indent 2) }} | ||
| {{ printf "%s" $bytes | indent 4 }} | ||
| {{- end }} |
There was a problem hiding this comment.
Same indentation issue in global dashboards section.
The same data key indentation pattern is used here. Apply the same fix as suggested for the regular dashboards block.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@charts/openstack-hypervisor-operator/templates/dashboards.yaml` around lines
21 - 35, The data key for global dashboards is indented incorrectly due to using
indent inside the printf for the key (the line constructing the map key: the
printf "%s: |-" using ($path | replace "/" "-" | indent 2)), which causes YAML
parsing/format issues; change it to emit the key without indenting the string
(remove the indent filter on ($path | replace "/" "-")) and keep the dashboard
JSON bytes indented (the printf "%s" $bytes | indent 4) so the ConfigMap data
field is correctly formed; apply this same fix within the global block that uses
.Values.dashboards.global.create, $root and the range over .Files.Glob
"dashboards/*.json".
There was a problem hiding this comment.
🧹 Nitpick comments (3)
charts/openstack-hypervisor-operator/values.yaml (2)
84-85: Consider the same CRD dependency risk for custom resource metrics.Similar to
prometheusRules.create, defaultingcustomResourceMetrics.create: trueassumes kube-state-metrics is configured to consume these ConfigMaps. If thedev.custom.kube-state-metricslabel discovery isn't set up, the ConfigMaps will be created but unused. This is lower risk than the PrometheusRule issue (no install failure), but worth documenting in a comment or README.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@charts/openstack-hypervisor-operator/values.yaml` around lines 84 - 85, Add a note documenting the CRD dependency for custom resource metrics: update the chart values file or README to explain that customResourceMetrics.create: true will create ConfigMaps intended for kube-state-metrics and that these will be unused unless label discovery (dev.custom.kube-state-metrics) is configured, similar to the existing warning for prometheusRules.create; reference the customResourceMetrics.create key and the dev.custom.kube-state-metrics label discovery setting so operators know to enable label discovery or set customResourceMetrics.create: false when kube-state-metrics is not configured.
75-76: Consider consistency:dashboards.createalso defaults totrue.If
prometheusRules.createshould default tofalsefor safety, consider whetherdashboards.create: truehas similar assumptions about dashboard infrastructure (e.g., Perses, Grafana) being present.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@charts/openstack-hypervisor-operator/values.yaml` around lines 75 - 76, The values.yaml currently enables dashboards.create by default while prometheusRules.create defaults to false; decide and make them consistent: either set dashboards.create to false to match the safe default of prometheusRules.create, or keep dashboards.create true but add a clear comment and chart README note documenting the prerequisite dashboard backend (Grafana/Perses) and any extra CRDs/permissions required. Update the values.yaml entry for dashboards.create and the chart documentation (README) accordingly, and ensure any templates referencing .Values.dashboards.create (e.g., dashboard template helpers) still behave correctly when the value is false.charts/openstack-hypervisor-operator/alerts/eviction.yaml (1)
18-29: Verify label preservation in joined PromQL expressions.When using
and on (name), only labels from the left-hand side of the expression are preserved in the result. If you need labels likehypervisor(from theeviction_infometric) in the alert, consider usinggroup_leftto preserve additional labels, or verify that$labels.namealone is sufficient for the alert context.Same consideration applies to
EvictionOutstandingRamHighbelow.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@charts/openstack-hypervisor-operator/alerts/eviction.yaml` around lines 18 - 29, The joined PromQL in the EvictionMigrationFailing alert (and similarly EvictionOutstandingRamHigh) currently uses "and on (name)" which only preserves labels from the left-hand side; update the vector matching to preserve needed labels (e.g., use "and on(name) group_left(hypervisor)" or an appropriate group_left(...) with the specific label names like hypervisor from eviction_info) so the alert's annotations (summary/description using $labels.hypervisor or $labels.name) have the required labels available, or alternatively verify and adjust the annotation templates to only reference labels guaranteed to remain (e.g., $labels.name) if no extra labels are needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@charts/openstack-hypervisor-operator/alerts/eviction.yaml`:
- Around line 18-29: The joined PromQL in the EvictionMigrationFailing alert
(and similarly EvictionOutstandingRamHigh) currently uses "and on (name)" which
only preserves labels from the left-hand side; update the vector matching to
preserve needed labels (e.g., use "and on(name) group_left(hypervisor)" or an
appropriate group_left(...) with the specific label names like hypervisor from
eviction_info) so the alert's annotations (summary/description using
$labels.hypervisor or $labels.name) have the required labels available, or
alternatively verify and adjust the annotation templates to only reference
labels guaranteed to remain (e.g., $labels.name) if no extra labels are needed.
In `@charts/openstack-hypervisor-operator/values.yaml`:
- Around line 84-85: Add a note documenting the CRD dependency for custom
resource metrics: update the chart values file or README to explain that
customResourceMetrics.create: true will create ConfigMaps intended for
kube-state-metrics and that these will be unused unless label discovery
(dev.custom.kube-state-metrics) is configured, similar to the existing warning
for prometheusRules.create; reference the customResourceMetrics.create key and
the dev.custom.kube-state-metrics label discovery setting so operators know to
enable label discovery or set customResourceMetrics.create: false when
kube-state-metrics is not configured.
- Around line 75-76: The values.yaml currently enables dashboards.create by
default while prometheusRules.create defaults to false; decide and make them
consistent: either set dashboards.create to false to match the safe default of
prometheusRules.create, or keep dashboards.create true but add a clear comment
and chart README note documenting the prerequisite dashboard backend
(Grafana/Perses) and any extra CRDs/permissions required. Update the values.yaml
entry for dashboards.create and the chart documentation (README) accordingly,
and ensure any templates referencing .Values.dashboards.create (e.g., dashboard
template helpers) still behave correctly when the value is false.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 21b47b30-05e6-4694-8f63-aef340e91ce1
📒 Files selected for processing (4)
charts/openstack-hypervisor-operator/alerts/eviction.yamlcharts/openstack-hypervisor-operator/templates/metrics-eviction-cm.yamlcharts/openstack-hypervisor-operator/templates/metrics-hypervisor-cm.yamlcharts/openstack-hypervisor-operator/values.yaml
Summary by CodeRabbit
New Features
Chores