Works with Prometheus · Grafana · Loki · Tempo · Mimir

Your Prometheus stack fires the alert.
Wachd explains why.

You have already done the hard part: curated alert rules, set up Loki, wired Grafana dashboards. When those alerts fire, you still spend 40 minutes reconstructing what happened across commits, logs, and metrics. Wachd runs that reconstruction automatically and delivers the result before your on-call engineer opens a terminal.

Deploy free on Kubernetes →Watch demo ↗

Wachd does not replace your observability stack

Prometheus, Grafana, Loki, and Tempo are doing their jobs well. They collect the signals, store the data, and fire alerts when thresholds are crossed.

The gap is what happens after the alert fires. Your on-call engineer gets a notification that HighErrorRate firing: checkout-api and then opens four tabs — Grafana for metrics, Loki for logs, GitHub for recent commits, and a Confluence runbook that may or may not be current.

Wachd closes that gap. It receives the alert via webhook, queries your existing Prometheus and Loki endpoints, reads recent commits from GitHub, and returns a diagnosis before the page reaches anyone.

How your stack fits together with Wachd

Wachd reads from your existing endpoints. No agents, no schema changes, no new exporters.

Tool	What it does today	What Wachd adds
Prometheus / Alertmanager	Fires the alert, routes it via webhook_config	Receives alert, queries Prometheus for metric context around alert time
Grafana Alerting	Fires the alert, routes to contact points	Receives alert via webhook contact point, HMAC validated
Loki	Stores logs	Pulls last 30 min of error logs for the affected service automatically
Tempo / Jaeger	Stores traces	Trace IDs in logs can be surfaced in the root-cause summary
Mimir / Thanos	Long-term metric storage	Queries metric history for baseline comparison around alert time
GitHub / GitLab	Stores code	Reads last N commits to the affected service repo — read-only

Connect in under 5 minutes

Wachd receives alerts via standard webhook — the same mechanism Grafana and Alertmanager already support.

Grafana Alerting

1.Go to Alerting → Contact points → New contact point
2.Type: Webhook
3.URL: your Wachd webhook URL for the team
4.Optional: add the shared secret as an Authorization header
5.Add to a notification policy and you are done

Prometheus Alertmanager

1.Add a webhook_config receiver in alertmanager.yml
2.url: your Wachd webhook URL
3.Wachd accepts the standard Alertmanager webhook payload
4.No schema changes required

# alertmanager.yml — Prometheus

receivers:

- name: wachd

webhook_configs:

- url: 'https://wachd.company.internal/api/v1/webhook/<teamId>/<secret>'

# send_resolved: true # optional

What Wachd delivers when an alert fires

Every alert triggers automatic collection from your Prometheus, Loki, and GitHub endpoints. The result is delivered via SMS, voice, email, or Slack — with the analysis already attached.

→A two-sentence probable cause in plain English
→The most likely contributing signal — recent deploy, config change, dependency timeout, or resource exhaustion
→The metric values before and after the anomaly window
→The commit(s) that touched the affected service in the hours before the alert
→A suggested action: rollback, investigate dependency, or escalate
→A link to the most similar past incident if one exists in your team's history

PII is stripped before the AI sees anything. Runs with Ollama in-cluster (air-gapped), or Claude, OpenAI, Gemini — configured in one line in values.yaml.

Built for teams who already run their own stack

Kubernetes operators with curated alert rules

You have already invested in Cilium, cert-manager, Tempo, and Mimir alerts. The alert quality is good — the problem is triage speed after they fire. Wachd adds the explanation layer without changing your alert configuration.

Teams running Prometheus + Grafana + Loki together

The three-tool combination gives you metrics, dashboards, and logs. Wachd reads from all three simultaneously when an alert fires and correlates the result into a single diagnosis. No need to open each tool separately.

Self-hosted and air-gapped environments

Wachd runs entirely inside your cluster. With Ollama enabled, there are zero outbound API calls. All incident data, AI analysis, and on-call history stays inside your Kubernetes namespace.

Add root-cause analysis to your existing Prometheus stack

Keep your Prometheus, Grafana, Loki, and Tempo setup exactly as it is. Add Wachd as the layer that explains alerts — deploys in under 30 minutes, Apache 2.0, no account required.

Deploy free on Kubernetes →Air-gapped deployment →

Questions? sales@wachd.io or Discord.